Datasets were searched for using the same filters as before - must be human tissue that expressed lewy bodies, must have at least 3 patients and 3 controls, and must…
10 datasets were identified:
| CodeID | Tissue/Cell type | Variant | Samples | Type of array | GEO Ref | Comments | PIs |
|---|---|---|---|---|---|---|---|
| DIJ | Substantia Nigra | not specified | 15 disease, 8 control | Affymetrix Human Genome U133 Plus 2.0 Array | GSE49036 | Has some Braak staging | Dijkstra AA, Ingrassia AI, de Menezes RX, van Kesteren RE, Rozemuller AM, Heutink P, vandeBerg WJ |
| FFR | Substantia Nigra | not specified | 9 replicates for the controls and 16 replicates for the Parkinson’s disease patients | Affymetrix Human Genome U133 Plus 2.0 Array | GSE7621 | Ffrench-Mullen JM | |
| MOR.SN | substantia nigra, split into medial and lateral portions | sporadic | 15 samples of medial parkinsonian SN, 9 samples of lateral parkinsonian SN (24), 8 medial nigra control samples and 7 lateral nigra control samples(15) | Affymetrix Human Genome U133A Array | GSE8397 | Age and sex | Moran LB, Graeber MB |
| MID1 | Substantia Nigra pars compacta | not specified | 10 PD brain samples and 8 control | Affymetrix Human Genome U133 Plus 2.0 Array | GSE20141 | Middleton FA, Kim PD, Zhang-James Y, Davis RL | |
| MID2 | whole substantia nigra | not specified | 18 controls, 11 patients | Affymetrix Human Genome U133A Array | GSE20292 | Age and sex | Middleton FA, James M, Zhang Y, Davis RL |
| DUM | Cortex (BA9) | not specified | 29 PD, 44 neurologically normal controls | Illumina HiSeq 2000 | GSE68719 | Have age of death, sex, and some samples have corresponding proteomics | Dumitriu A, Golji J, Labadorf AT, Gao B, Beach TG, Myers RH, Longo KA, Latourelle JC |
| MOR.FC | Frontal cortex | sporadic | 3 Controls 5 Patients | Affymetrix Human Genome U133A Array | GSE8397 | Age and sex | Moran LB, Graeber MB |
| LEW | two regions of the medulla | not specified | 14 PD samples and 8 controls (two regions) | Affymetrix Human Genome U133A 2.0 Array | GSE19587 | Age and sex | lewandowski nm, small sa |
| MID3 | prefrontal cortex | not specified | 15 controls 14 patients | Affymetrix Human Genome U133A Array | GSE20168 | Age and sex | Middleton FA, James M, Zhang Y, Davis RL |
| MID4 | putamen | not specified | 20 controls and 15 patients | Affymetrix Human Genome U133A Array | GSE20291 | Age and sex | Middleton FA, James M, Zhang Y, Davis RL |
Checked data distribution using PCA and boxplots. Some of the datasets have quite mixed samples wereas others separate quite nicely into controls and patients. Boxplots didn’t suggest any samples were outliers.
As with the TDP-43 data, microarray data was analysed using Limma. The genes were divided into positive and negative fold change. Intersected gene lists can be found in /Users/clairegreen/Documents/PhD/Parkinsons/Parkinsons_Code/Results/GeneExpression/FoldChange.
| ~~~ | ~~~ | ~~~ | ~~~ | ~~~ | ~~~ | ~~~ | ~~~ | ~~~ | ~~~ | ~~~ |
|---|---|---|---|---|---|---|---|---|---|---|
| A4GNT | C3 | CPED1 | FAM46A | HCK | KDM4A | LY6G5C | NRDE2 | RBMS2 | SMIM14 | TNFRSF10B |
| ABCC6 | C3AR1 | CPM | FAM60A | HDC | KDM4C | LYVE1 | NYX | RCC1 | SNAP23 | TNFRSF10D |
| AC012065.7 | C8orf60 | CPSF3L | FANCE | HERC2P3 | KIAA0101 | LYZL6 | OR1F1 | RDH16 | SND1-IT1 | TNFRSF25 |
| ACP5 | C9orf3 | CROCC | FAS | HIST1H1T | KIAA0226L | MAF | ORM1 | RHBDF2 | SOGA1 | TNFSF14 |
| ACSBG2 | CALML4 | CROCCP3 | FASLG | HIST1H2AB | KIAA0485 | MAFF | OSM | RIBC2 | SORBS1 | TNN |
| ACSS3 | CASP1 | CSPG4 | FCGBP | HIST1H2AD | KIAA1614 | MAP2K7 | P4HA1 | RIPK2 | SOS2 | TP73-AS1 |
| ADH1B | CASP2 | CTBP2 | FCGR2A | HMOX1 | KIAA1661 | MAPKBP1 | PAGE1 | RNASE6 | SOX12 | TRAF1 |
| ADIPOQ | CASP4 | CTD-2269F5.1 | FCGR2B | HNRNPM | KIF13A | MAVS | PAPOLA | RP11-217B7.2 | SOX2 | TRAPPC10 |
| ADORA3 | CASP6 | CXCR4 | FCRL2 | HOXB8 | KLHL36 | MBD5 | PARP16 | RP11-255C15.3 | SOX9 | TRIM21 |
| AF007147 | CASP7 | CXorf21 | FGF17 | HP1BP3 | KLK2 | MBTD1 | PAWR | RP11-403P17.4 | SP140L | TRIM5 |
| AGAP10 | CASP8 | CXorf36 | FGF3 | HS3ST1 | KRI1 | MCM3AP-AS1 | PCDH12 | RP11-548H18.2 | SPG21 | TTLL5 |
| AGRP | CATSPER2 | CYP21A1P | FGFR1 | HSPA1A | KRT19P2 | MECOM | PCDHGA8 | RPA4 | SPINK1 | TUBD1 |
| ALOX5 | CCDC101 | CYP39A1 | FKBP10 | HSPA1L | L2HGDH | MED13L | PDLIM1 | RPL23AP32 | SPTLC3 | TULP2 |
| ALOX5AP | CCDC15 | CYTH4 | FLCN | HSPA6 | LAG3 | METTL4 | PELI2 | RPL23AP53 | SRP19 | TYMP |
| ALPK1 | CCDC40 | DAPP1 | FLJ21369 | HSPD1 | LAMB2 | METTL7A | PGF | RREB1 | SRSF5 | TYROBP |
| AMELX | CCL17 | DBF4B | FLT1 | IFNA2 | LAT2 | MFAP5 | PGPEP1 | RRNAD1 | SSTR3 | UGGT1 |
| AMELY | CCL22 | DCLRE1C | FLT3LG | IFNA21 | LECT1 | MICB | PIDD1 | RUNX1-IT1 | ST20 | UPK3B |
| AMH | CCL27 | DDR1-AS1 | FMO5 | IGF1R | LEF1 | MID1 | PIGO | RUNX3 | ST6GAL1 | USP34 |
| ANGPT2 | CD14 | DENND2D | FNDC3B | IGFBP5 | LENEP | MILR1 | PILRA | S100A11P1 | STAG3L3 | VPS54 |
| ANP32A-IT1 | CD163 | DFFB | FOXD1 | IGKV1-17 | LEPREL1 | MIR612 | PIM1 | S100A4 | STIP1 | VSIG4 |
| AP1G2 | CD207 | DIP2A | FOXL2 | IGLL3P | LEPREL4 | MKRN4P | PLAC8 | SAFB2 | STK10 | WBSCR16 |
| APOBEC3C | CD22 | DISC1 | FPR1 | IGSF9B | LGALS9 | MS4A6A | PLEKHF2 | SAT1 | STOM | WDR4 |
| APOC2 | CD247 | DKFZP434A062 | FYB | IL12RB1 | LILRA2 | MSH5 | PLGLB1 | SCAF4 | TAF4 | WDR52 |
| AQP8 | CD300A | DMC1 | FZD5 | IL16 | LINC00115 | MSX1 | PLIN2 | SCD5 | TAL1 | WDR55 |
| ARHGAP25 | CD37 | DNA2 | FZD7 | IL17RB | LINC00260 | MT1M | PLK3 | SCIN | TAS2R10 | WDR78 |
| ARHGDIB | CD48 | DNAJB1 | FZD9 | IL18 | LINC00894 | MYH7B | PLOD2 | SCNN1B | TAS2R13 | XRCC2 |
| ARHGEF4 | CD68 | DNAJB6 | G0S2 | IL1R1 | LINC00963 | MYO1C | PLTP | SDCCAG3 | TAZ | ZBED2 |
| ARID3A | CD84 | DNASE2 | G3BP1 | IL21 | LLGL2 | MZT2B | PODNL1 | SERPINA1 | TBX6 | ZC3HAV1 |
| ATAD2B | CDA | DOCK10 | GABPA | INE1 | LOC100272216 | NABP1 | POGLUT1 | SERPINH1 | TBXAS1 | ZFP36L1 |
| ATHL1 | CDC14A | DOCK2 | GAL3ST4 | INPP5B | LOC100506699 | NACAP1 | POGZ | SERTAD3 | TCF3 | ZMYND8 |
| ATP8B1 | CDK2AP2 | DSE | GAS1 | INPP5D | LOC101928198 | NBEAL2 | PPP1R14D | SH2B2 | TFAP2B | ZNF14 |
| ATXN7 | CEBPD | DTYMK | GATA1 | INVS | LOC101928274 | NBR2 | PPP2R1B | SHMT1 | TFEC | ZNF217 |
| AVIL | CEL | EEF1DP5 | GCKR | IRF4 | LOC101929240 | NCKAP1L | PRR11 | SIGLEC7 | TFPI | ZNF224 |
| AZGP1 | CHD2 | EFNA4 | GDF15 | IRF6 | LOC101929889 | NEUROG1 | PRRG4 | SIGLEC8 | TGFBR3 | ZNF235 |
| AZGP1P1 | CHKB-CPT1B | EGF | GDF9 | IRF7 | LOC102724229 | NFATC3 | PSCA | SIM2 | TGIF1 | ZNF34 |
| AZU1 | CHRNG | ELF4 | GEM | ITFG2 | LOC102725016 | NFE2 | PSG11 | SIX6 | TIMM44 | ZNF354A |
| BANP | CHST4 | EMC10 | GJA9-MYCBP | ITGB6 | LOC284009 | NHLH1 | PTPN14 | SLA | TLR5 | ZNF430 |
| BC069804 | CLEC2B | ENTPD1 | GJD2 | ITPR3 | LOC729164 | NOTCH2NL | PTPRCAP | SLAMF8 | TM4SF1 | ZNF446 |
| BMP8B | CLIC2 | ENTPD7 | GK2 | JMJD6 | LOC79999 | NOX3 | RAB13 | SLC11A1 | TMEM176A | ZNF516 |
| BRPF1 | CLK1 | EP300 | GP9 | KCNE4 | LPAR2 | NPIPA1 | RAC2 | SLC27A3 | TMEM19 | ZNF665 |
| BTBD18 | COL16A1 | EPOR | GPR25 | KCNJ8 | LPAR4 | NPIPB15 | RAD52 | SLC2A5 | TMEM51 | ZNF670 |
| C10orf12 | COL5A1 | ETV1 | GRAMD3 | KCNK5 | LPP | NPL | RAD54L | SLC35C2 | TMPRSS11E | ZNF710 |
| C11orf16 | COL7A1 | FAM106A | GTSE1 | KCNQ1 | LRRC1 | NPR3 | RBL1 | SLC7A9 | TMPRSS5 | ZNF721 |
| C1QB | COL9A1 | FAM118A | GUSBP3 | KCTD5 | LRRFIP1 | NR6A1 | RBM38 | SMAD6 | TNFAIP8 | ZNF74 |
| ZNF783 |
| ~~~ | ~~~ | ~~~ | ~~~ | ~~~ | ~~~ | ~~~ | ~~~ | ~~~ | ~~~ | ~~~ |
|---|---|---|---|---|---|---|---|---|---|---|
| ABCA11P | ATP5D | CLINT1 | EIF2B3 | GPN1 | LOC728392 | MZT2A | PFDN2 | PSMD8 | SLC9A1 | TSSC1 |
| ABCF1 | ATP5F1 | CLIP3 | EIF2S1 | GPX4 | LRBA | NAPA | PFKM | PSME3 | SLITRK5 | TTC19 |
| ABHD11 | ATP5G1 | CLTB | EIF3F | GRAMD1B | LRPPRC | NDEL1 | PFN2 | PSMG1 | SNCA | TTC9 |
| ACAT1 | ATP6AP1 | CMAS | EIF3I | GSS | LYRM4 | NDFIP1 | PGAM1 | PTBP2 | SNRPN | TUBA1B |
| ACAT2 | ATP6V0B | COA3 | EIF3K | GSTA4 | LYRM9 | NDUFA10 | PGRMC1 | PTDSS1 | SNX24 | TUBA1C |
| ACP2 | ATP6V0C | COA7 | EIF6 | GSTO1 | MAGED1 | NDUFA13 | PHB2 | PTP4A1 | SORD | TUBB |
| ACTB | ATP6V1E1 | COMMD9 | ELOVL6 | HARS | MAGED2 | NDUFA3 | PINK1 | PTPRA | SPAG16 | TUBB2A |
| ACTR1A | ATP6V1F | COPS3 | ELP3 | HERC6 | MAK16 | NDUFA7 | PITPNA | RAB14 | SPAG7 | TUBB3 |
| ADAM23 | ATP6V1G2 | COPS4 | EMC7 | HINT1 | MAP1B | NDUFA9 | PITPNB | RABEPK | SPCS1 | TUBG1 |
| AFG3L2 | ATP6V1H | COPS5 | ENDOD1 | HLTF | MAP1LC3B | NDUFAB1 | PITRM1 | RAN | SPCS3 | TUFM |
| AGTPBP1 | AURKAIP1 | COPS7A | ENOPH1 | HMCES | MAPK10 | NDUFAF1 | PKI55 | RANGRF | SPRYD7 | TUSC2 |
| AHCY | AVPI1 | COQ3 | ENSA | HMGN4 | MDH1 | NDUFB5 | PLD3 | RARS | SRPRB | TXNDC15 |
| AIFM1 | B3GNT2 | COX7A2L | ENTPD6 | HNRNPA0 | MDH2 | NDUFS1 | PLEKHB2 | RBFOX2 | SRRD | UBB |
| AIMP2 | BABAM1 | COX8A | ERCC1 | HPCAL4 | ME2 | NDUFS3 | PMPCA | RBM3 | SSSCA1 | UBE2E3 |
| AJAP1 | BECN1 | CREB3 | ERP29 | HSPA12A | ME3 | NDUFS5 | PMS2 | RITA1 | ST3GAL5 | UBE3B |
| AK055981 | BEND5 | CRMP1 | EXOSC9 | IARS | MECR | NDUFV1 | PMS2P1 | RNASEH1 | STAT4 | UCHL1 |
| AK5 | BEX4 | CRY2 | FABP3 | IARS2 | MFN2 | NDUFV2 | PMS2P5 | RNF10 | STRAP | UQCRC1 |
| AKAP12 | BFSP1 | CSNK2A1 | FAF1 | IDH3B | MGST3 | NEDD8 | POLR2C | RPA2 | STS | UQCRFS1 |
| AKAP6 | BPGM | CSRNP2 | FAM127A | IDH3G | MICU1 | NEU1 | POLR3C | RPH3A | STX12 | UQCRQ |
| AKTIP | BRE | CTNNA2 | FAM134C | IDI1 | MIEF1 | NGFRAP1 | POP4 | RPP14 | STXBP1 | UROD |
| ALAS1 | BRF2 | CUL2 | FAM162A | IMP4 | MIF | NHP2 | PPFIA2 | RPS6KC1 | SUMO3 | UROS |
| ALDH1A1 | BSN | CX3CL1 | FAM168A | INSIG1 | MIR21 | NIF3L1 | PPIA | RRAGA | SYN1 | USP7 |
| ANXA2 | BTBD3 | CXorf40A | FAM206A | IQSEC1 | MIR3656 | NIPSNAP1 | PPME1 | RTCB | SYT11 | UTP18 |
| ANXA6 | C11orf49 | CXorf40A | FBXW2 | IRGQ | MIR4784 | NIT2 | PPP1R7 | RTF1 | TACO1 | VAMP7 |
| AP1S1 | C12orf10 | CXorf40B | FDPS | ITFG1 | MIR636 | NME1 | PPP2CA | RTN4 | TCEA2 | VLDLR |
| AP2M1 | C1orf216 | CYC1 | FECH | ITPA | MIR6890 | NOP16 | PPP2R1A | RWDD2B | TCEAL2 | VPS41 |
| AP2S1 | C21orf33 | CYP2E1 | FHIT | ITPR1 | MLH1 | NRXN3 | PPP2R2B | SAMM50 | TCEAL4 | VPS4B |
| AP3M2 | C3orf62 | DCTN3 | FIBP | KATNB1 | MMP24-AS1 | NSDHL | PPP2R5D | SARS | TFCP2 | VPS51 |
| AP3S2 | C5orf30 | DCTN6 | FIG4 | KHDRBS1 | MOCS2 | NUDCD3 | PPP3CB | SCAMP5 | TFPT | WBP11 |
| APEH | C6orf106 | DCTPP1 | FKBP1B | KIAA0391 | MOSPD1 | NUDT2 | PRC1 | SCFD1 | TIMM10 | WDR44 |
| APEX1 | CCBL2 | DDA1 | FLRT1 | KIAA0513 | MPI | NUP155 | PRKCZ | SDHA | TIMM10B | WDR61 |
| APLP2 | CCDC51 | DDHD2 | FRMPD4 | KIAA1279 | MRPL15 | ODC1 | PRMT8 | SDHAF1 | TIMM8B | WDR7 |
| APMAP | CCK | DDX1 | FTO | KIF2A | MRPL35 | ODF2 | PRPF4 | SDHAP1 | TMED3 | WDR77 |
| AREL1 | CCNH | DDX24 | FUCA1 | KIF3C | MRPL4 | OPTN | PRR13 | SEC31A | TMEM177 | WDYHV1 |
| ARF1 | CCSER2 | DDX42 | GABARAPL1 | KIFAP3 | MRPS18A | ORC5 | PSD3 | SEH1L | TMEM208 | YWHAZ |
| ARF3 | CDC123 | DGUOK | GABBR1 | KLHDC4 | MRPS22 | OSBP | PSMA1 | SERINC3 | TMEM246 | ZDHHC4 |
| ARFGAP2 | CDC42 | DHRS11 | GALNT11 | L1CAM | MRPS33 | OXCT1 | PSMA5 | SEZ6L2 | TMEM41B | ZDHHC6 |
| ARHGEF9 | CDK20 | DHRS7B | GARS | LANCL1 | MRPS35 | OXLD1 | PSMB3 | SF3A3 | TMEM97 | ZMYM4 |
| ARL2BP | CDK5 | DIABLO | GBAS | LANCL2 | MRPS7 | PAAF1 | PSMB4 | SF3B5 | TMUB2 | ZNF365 |
| ARMC2-AS1 | CDK7 | DKK3 | GGNBP2 | LBH | MRTO4 | PAFAH1B1 | PSMB5 | SH3BP5 | TOMM20 | ZNF593 |
| ARPC1A | CDS2 | DLD | GLO1 | LCMT1 | MTCH1 | PARL | PSMB6 | SIPA1L1 | TOR1AIP2 | ZNF629 |
| ARPC5L | CERK | DNAJC8 | GLOD4 | LDHA | MTCH2 | PARN | PSMB7 | SLC23A2 | TOX4 | |
| ASS1 | CFL1 | DRG1 | GLT8D1 | LDHB | MTCL1 | PCCB | PSMC1 | SLC25A11 | TP53BP1 | |
| ASTN2 | CHCHD2 | DSTN | GMPR2 | LDLRAD4 | MTHFD1 | PCYOX1L | PSMC2 | SLC25A3 | TPGS2 | |
| ATG14 | CHP1 | DUSP26 | GNB1 | LDOC1 | MTPAP | PDCL3 | PSMC5 | SLC25A46 | TRAP1 | |
| ATP1A1 | CHST1 | DYNC1H1 | GNB5 | LETMD1 | MTX2 | PDHB | PSMD1 | SLC25A5 | TRAPPC2L | |
| ATP1A3 | CIAPIN1 | DYNC1I1 | GOT1 | LINC00094 | MX1 | PDK2 | PSMD11 | SLC25A6 | TSG101 | |
| ATP2B2 | CIRBP | DZIP3 | GOT2 | LMO3 | MYH10 | PDZD8 | PSMD14 | SLC30A9 | TSPAN3 | |
| ATP5A1 | CITED1 | EAPP | GPD1L | LOC101927673 | MYL12B | PEX14 | PSMD2 | SLC35B1 | TSPAN7 | |
| ATP5B | CLCN6 | EIF2AK1 | GPI | LOC645166 | MYL5 | PEX19 | PSMD7 | SLC6A1 | TSR2 |
Mouse data was sourced from GEO https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE4758, referenced as CON, however during the analysis I realised that alpha synuclein had a positive log fold change, even though every other dataset reported a negative log fold change for sporadic parkinson’s disease. Looking back at the dataset I realised it was an overexpression model rather than a mutant. Woops!
What’s interesting is that canonically, alpha synuclein has been suggested to be overexpressed in sporadic parkinson’s patients.
As was done with the TDP-43 pipeline, I had to find other signals that could be contaminating the sporadic signal. I decided to use the ALS datasets from the TDP-43 analysis to represent “general neurodegeneration”, (C9orf72, sALS, PET, RAV, SOD1, FUS) but actually there was very little overlap:
ALOX5 ALPK1 STOM MAFF COL7A1
PMS2P1 ATP6V1G2 INSIG1 RRAGA TPGS2 MTCH2 PPP2CA GLO1 PDHB
In a similar vein I decided to source some sporadic Parkinson’s disease blood - to make sure it was leaving a more tissue specific signal. I found two datasets on GEO - GSE99039 (AMA) and GSE72267 (RON). These were bigger datasets than I usually use - AMA was total 353 samples, RON 59. What I realised was that my filters for the presence calls were still 2. This meant that 24,000 genes/probes passed through filtering which was making huge overlaps. I realised a filter of 2 wasn’t really proportional to the number of samples in AMA, so I increased this to 20 (as the dataset is approximately 10 times larger than the other datasets I have been using).
These two datasets produced a common gene expression signature of approximately 5000 genes, which is quite a lot of concordance. When you compare this to the brain signal, there is an overlap of 88 upregulated and 76 downregulated genes. This is quite a small overlap considering the size of the blood signature.
At this point, the remaining gene list for the sporadic signature is 558 genes (look for Filt_ALS_blood files)
Down http://amp.pharm.mssm.edu/Enrichr/enrich?dataset=3rwmj
Up http://amp.pharm.mssm.edu/Enrichr/enrich?dataset=3rwmr
There was a dataset I labelled “CON” with the GSEnumber GSE4758 that I included for a little while as it was a mouse model of alpha synuclein which I later removed as the research question changed.
Oliver Bandmann mentioned that there is a phenomenon that LRRK2 patients exhibit their disease more closely to sporadic patients than other familial patients. I thought this was interesting and could be a good application of my methdology.
I sourced two datasets with LRRK2 patient data - GSE34516 and GSE23290. These were exon array so had to first be preprocessed by Wenbin in the Affymetrix Console using RMA-sketch. The output was normalised expression data. Mas5Calls didn’t seem to work so I skipped this step.
#### BOT GSE34516 #########
setwd("/Users/clairegreen/Documents/PhD/Parkinsons/Parkinsons_Data/")
Data <- read.csv("GSE34516.RMA-GENE-EXTENDED-EC-hg19-na36_LRRK2only.csv", row.names = 1)
genename <- as.data.frame(Data$GeneSymbol)
rownames(genename) <- rownames(Data)
colnames(genename) <- "Gene.Symbol"
#Data is already normalised using RMA-sketch
analysis.name<-"BOT" #Label analysis
expressionMatrix<-Data[,1:6] #takes expression from normalised expression set
Treat<-factor(rep(c("Control", "Patient"),c(4,2)), levels=c("Control", "Patient"))
design<-model.matrix(~Treat)
rownames(design)<-colnames(expressionMatrix)
design
#Conduct statistical analysis of expression
library(limma)
fit<-lmFit(expressionMatrix, design) #linear model fit
fit<-eBayes(fit)
result<-topTable(fit, coef="TreatPatient", adjust="BH", number=nrow(expressionMatrix)) #"BH" adjust for multiple hypothesis testing
#toptable normally takes top number but this takes all
setwd("/Users/clairegreen/Documents/PhD/Parkinsons/Parkinsons_Code/Results/GeneExpression")
write.csv(result, file = paste(analysis.name, "_result.csv", sep = ""))
result$"Fold Change"<-2^result$logFC
result$"Fold Change"[result$"Fold Change"<1]<-(-1)/result$"Fold Change"[result$"Fold Change"<1] #converts log fold change into a linear value above or below 0
expressionLinear<-as.data.frame(2^expressionMatrix)
expressionLinear$ProbeSetID<-rownames(expressionLinear)
result<-merge(result, expressionLinear, by = 0) #merge values into one array
result<-merge(result, genename, by.x = "Row.names", by.y = 0)
result<-subset(result, subset=(Gene.Symbol !="---")) #if no gene symbol, discount
setwd(dir = "/Users/clairegreen/Documents/PhD/Parkinsons/Parkinsons_Code/Results/GeneExpression/")
write.csv(result, file=paste(analysis.name, "_finalresult.csv", sep=""), row.names=FALSE, quote = FALSE)
genesort <- result[order(result$P.Value),]
uniqueresult <- genesort[!duplicated(genesort[,16]),]
write.csv(uniqueresult, file=paste(analysis.name, "rankeduniqueresult.csv", sep=""), row.names=FALSE, quote = FALSE)
#### BOT2 GSE23290 #########
setwd("/Users/clairegreen/Documents/PhD/Parkinsons/Parkinsons_Data/")
Data <- read.csv("GSE23290.RMA-GENE-EXTENDED-EC-hg19-na36_LRRK2only.csv", row.names = 1)
genename <- as.data.frame(Data$GeneSymbol)
rownames(genename) <- rownames(Data)
colnames(genename) <- "Gene.Symbol"
#Data is already normalised using RMA-sketch
analysis.name<-"BOT2" #Label analysis
expressionMatrix<-Data[,1:8] #takes expression from normalised expression set
Treat<-factor(rep(c("Control", "Patient"),c(5,3)), levels=c("Control", "Patient"))
design<-model.matrix(~Treat)
rownames(design)<-colnames(expressionMatrix)
design
#Conduct statistical analysis of expression
library(limma)
fit<-lmFit(expressionMatrix, design) #linear model fit
fit<-eBayes(fit)
result<-topTable(fit, coef="TreatPatient", adjust="BH", number=nrow(expressionMatrix)) #"BH" adjust for multiple hypothesis testing
#toptable normally takes top number but this takes all
setwd("/Users/clairegreen/Documents/PhD/Parkinsons/Parkinsons_Code/Results/GeneExpression")
write.csv(result, file = paste(analysis.name, "_result.csv", sep = ""))
result$"Fold Change"<-2^result$logFC
result$"Fold Change"[result$"Fold Change"<1]<-(-1)/result$"Fold Change"[result$"Fold Change"<1] #converts log fold change into a linear value above or below 0
expressionLinear<-as.data.frame(2^expressionMatrix)
expressionLinear$ProbeSetID<-rownames(expressionLinear)
result<-merge(result, expressionLinear, by = 0) #merge values into one array
result<-merge(result, genename, by.x = "Row.names", by.y = 0)
result<-subset(result, subset=(Gene.Symbol !="---")) #if no gene symbol, discount
setwd(dir = "/Users/clairegreen/Documents/PhD/Parkinsons/Parkinsons_Code/Results/GeneExpression/")
write.csv(result, file=paste(analysis.name, "_finalresult.csv", sep=""), row.names=FALSE, quote = FALSE)
genesort <- result[order(result$P.Value),]
uniqueresult <- genesort[!duplicated(genesort[,18]),]
write.csv(uniqueresult, file=paste(analysis.name, "rankeduniqueresult.csv", sep=""), row.names=FALSE, quote = FALSE)
I had to skip out a number of steps because it didn’t require the annotation file. There were about 28000 rows left over which was quite a lot, though approximately 7000 were LOC transcripts so took us down to the number of genes you would expect.
After adding these two datasets to the FoldChange_PD.R script, and filtering out for ALS and sporadic PD blood, there were 85 upregulated and 109 downregulated genes. This still included MSX1, SNCA, UCHL1, and ALDH1A1 from PD Malacards but not PINK1 anymore.
EnrichR UP: http://amp.pharm.mssm.edu/Enrichr/enrich?dataset=3t9ff DOWN: http://amp.pharm.mssm.edu/Enrichr/enrich?dataset=3t9fh
The other datasets I had were fibroblasts from PARK2 and LRRK2 patients as provided by Robin Highly/Oliver Bandmann/Heather Mortiboys. These were exon array datasets which were analysed by Wenbin using the Affymetrix console. When these were added to the filters, the number of common genes was 36, only one of which MSX1 was retained.
To give a little more signal, but to also see if the signal could be recovered, I discluded the blood data (as fibroblast is representative enough of a non-neuronal tissue), leaving 54 genes
DOCK2 RAD52 TNFAIP8 P4HA1 MS4A6A CD300A KLHL36 TRIM5 CD163 CXCR4 RIPK2 PAWR ZNF516 JMJD6 CASP7 MSX1 IGF1R LRRC1 SAFB2 C1QB IL17RB LAMB2 CCDC40 CDA FMO5 DNASE2 NFE2 ZNF217 HMOX1 PARP16 HP1BP3 PGPEP1 CD14 NPIPB15 IMP4 LETMD1 AP1S1 FHIT MECR ATP2B2 DCTPP1 WDYHV1 PSMC2 NRXN3 MRTO4 NDUFS3 SLC6A1 NME1 MRPL15 COPS3 TUBG1 PRC1 BFSP1 IARS
Approximately 10 of these genes have a relatively strong relationship with Parkinson’s or LRRK2 specifically. 32 were directly related according to VarElect, 22 indirectly related (VarElect_Results_cmgreen1-sheffield-ac-uk-20180522-014621619).
The next step was to find the PPI partners of these 54 genes.
#### ALS PARK2 LRRK2 ####
library(biomaRt)
setwd("/Users/clairegreen/Documents/PhD/TDP-43/TDP-43_Code/Results/PPI_Network/")
PPI <- read.table("iref14_Human_UP_noDup_table_nodash.txt", header = T)
braingenes <- read.csv("Zhang_BrainCelltype_Markers_braingenes.csv", header = T)
DEG_list <- readLines("~/Documents/PhD/Parkinsons/Parkinsons_Code/Results/ALSPARK2LRRK2/removeallALLgenes.txt")
mart <- useMart("ENSEMBL_MART_ENSEMBL",dataset="hsapiens_gene_ensembl", host="www.ensembl.org")
attributes <- listAttributes(mart)
mart_back <- getBM(attributes =c("hgnc_symbol", "uniprotswissprot"), filters="hgnc_symbol", values=DEG_list, mart=mart)
genelist_Uniprot <- subset(mart_back, !(mart_back$uniprotswissprot == ""))
setwd("/Users/clairegreen/Documents/PhD/Parkinsons/Parkinsons_Code/Results/ALSPARK2LRRK2/")
write.csv(genelist_Uniprot, "martback.csv", row.names = F)
# IDENTIFY MISSING GENES AND FIND UNIPROT CODES FOR THEM. NB SOME GENES MAY NOT BE PROTEIN CODING #
mart_table <- read.csv("martback.csv", header = T) #A table with the uniprot codes for the DEGs
uniprot_gene <- mart_table$uniprotswissprot
DEG_PPI <- subset(PPI, PPI$V1 %in% uniprot_gene | PPI$V2 %in% uniprot_gene)
rownames(DEG_PPI) <- 1:nrow(DEG_PPI)
write.csv(DEG_PPI, "DEG_PPI_ALP.csv", row.names = F)
DEG_PPI <- read.csv("DEG_PPI_ALP.csv")
DEG_PPI <- subset(DEG_PPI, DEG_PPI$Gene1 !="-")
DEG_PPI <- subset(DEG_PPI, DEG_PPI$Gene2 !="-")
write.csv(DEG_PPI, "FinalPDPPI.csv", row.names = F, quote = F)
From this we have a network of 1367 interactions between 1034 proteins (file = PD_DEGPPI.R, DEG_PPI_ALP.csv, ALP_PPIgenes.txt. Gene names used).
Next I had to find the corresonding rows in each of the RNA expression data files ready for the correlation anaylysis. The LRRK2 samples were not included in the coexpression analysis because with only 2 and 3 patients respectively, it would results in rho values of 1/-1 or 1, 0.5, 0, -0.5 or 1, which are difficult to threshold. I also removed the MOR.FC dataset as the distribution of Rho values was abnormally distributed
[]
#### ALP####
#Read in network nodes
DEG_PPI <- readLines("/Users/clairegreen/Documents/PhD/Parkinsons/Parkinsons_Code/Results/ALSPARK2LRRK2/ALP_PPIGenes.txt")
setwd("/users/clairegreen/Documents/PhD/Parkinsons/Parkinsons_Code/Results/GeneExpression/")
#Extract PPI network genes from each dataset#####
LEW <- read.csv("LEWfilteredresult.csv")
rownames(LEW) <- LEW$Gene.Symbol
LEW <- LEW[,28:41]
LEW <- subset(LEW, rownames(LEW) %in% DEG_PPI)
MID3 <- read.csv("MID3filteredresult.csv")
rownames(MID3) <- MID3$Gene.Symbol
MID3 <- MID3[,36:49]
MID3 <- subset(MID3, rownames(MID3) %in% DEG_PPI)
MID4 <- read.csv("MID4filteredresult.csv")
rownames(MID4) <- MID4$Gene.Symbol
MID4 <- MID4[,41:55]
MID4 <- subset(MID4, rownames(MID4) %in% DEG_PPI)
MOR.FC <- read.csv("MOR.FCfilteredresult.csv")
rownames(MOR.FC) <- MOR.FC$Gene.Symbol
MOR.FC <- MOR.FC[,24:28]
MOR.FC <- subset(MOR.FC, rownames(MOR.FC) %in% DEG_PPI)
DIJ <- read.csv("DIJfilteredresult.csv")
rownames(DIJ) <- DIJ$Gene.Symbol
DIJ <- DIJ[,29:43]
DIJ <- subset(DIJ, rownames(DIJ) %in% DEG_PPI)
FFR <- read.csv("FFRfilteredresult.csv")
rownames(FFR) <- FFR$Gene.Symbol
FFR <- FFR[,29:44]
FFR <- subset(FFR, rownames(FFR) %in% DEG_PPI)
MID1 <- read.csv("MID1filteredresult.csv")
rownames(MID1) <- MID1$Gene.Symbol
MID1 <- MID1[,28:37]
MID1 <- subset(MID1, rownames(MID1) %in% DEG_PPI)
MID2 <- read.csv("MID2filteredresult.csv")
rownames(MID2) <- MID2$Gene.Symbol
MID2 <- MID2[,39:49]
MID2 <- subset(MID2, rownames(MID2) %in% DEG_PPI)
MOR.SN <- read.csv("MOR.SNfilteredresult.csv")
rownames(MOR.SN) <- MOR.SN$Gene.Symbol
MOR.SN <- MOR.SN[,36:59]
MOR.SN <- subset(MOR.SN, rownames(MOR.SN) %in% DEG_PPI)
DUM <- read.csv("DUM_UniqueGene_DESeq2.csv")
rownames(DUM) <- DUM$hgnc_symbol
DUM <- DUM[,53:81]
DUM <- subset(DUM, rownames(DUM) %in% DEG_PPI)
#Find the gene names that all datasets have in common
DEG_com <- Reduce(intersect, list(rownames(DIJ), rownames(FFR),
rownames(LEW),rownames(MID1),rownames(MID2),
rownames(MID3),rownames(MID4),rownames(MOR.FC),
rownames(MOR.SN), rownames(DUM)))
This script resulted in 854 common names. Correlation analysis was conducted on sharc (/shared/hidelab2/user/mdp15cmg/Parkinsons/PD_Cor)
When the correlation relationships were downloaded and analysed, only 59 edges reached a consistent correlation value of over 0.1 or below -0.1 regardless of sign. This increases to 208 when the RNA-seq dataset is removed.
Talking to Kat we hypothesised that the Parkin Fibroblasts were perhaps driving away the signal too strongly. Also the number of samples was so small that it’s not necessarily giving a representative signal.
To compensate, I decided to go back to the blood samples and use some other samples from the AMA GSE99039 dataset which have familial mutations. There were 5 ATP13A2 cases, 15 Parkin cases, and 12 PINK1 cases. Two of the Parkin cases in the metadata didn’t have corresponding files, so this became 13 Parkin cases.
##### AMA ATP13A2 #########
setwd("/Users/clairegreen/Documents/PhD/Parkinsons/Parkinsons_Data/GSE99039_keepfiles/")
#run program to choose .CEL files from directory
celfiles <- fileBrowser(textToShow = "Choose CEL files", testFun = hasSuffix("[cC][eE][lL]"))
Data<-ReadAffy(filenames=celfiles) #read in files
rmaEset<-rma(Data) #normalise using RMA
analysis.name<-"AMA_ATP13A2" #Label analysis
dataMatrixAll<-exprs(rmaEset) #takes expression from normalised expression set
#mas5call generates presence/absence calls for each probeset
mas5call<-mas5calls(Data)
callMatrixAll<-exprs(mas5call)
colnames(callMatrixAll)<-sub(".CEL", ".mas5-Detection", colnames(callMatrixAll),fixed=TRUE)
colnames(callMatrixAll)<-sub(".cel", ".mas5-Detection", colnames(callMatrixAll),fixed=TRUE)
callMatrixAll<-as.data.frame(callMatrixAll)
callMatrixAll$ProbeSetID<-rownames(callMatrixAll)
countPf<-function(x){
sum(x=="P")
}
#count how many samples have presence calls
countPl<-apply(callMatrixAll, 1, countPf)
callMatrixAll$ProbeSetID<-rownames(callMatrixAll)
countPdf<-data.frame(ProbeSetID=names(countPl), countP=countPl)
#read annotation
# USING ANNOTATION FILE (if .csv, convert to .txt using excel)
annotation.file<-"/Users/clairegreen/Documents/PhD/TDP-43/TDP-43_Data/HG-U133_Plus_2.na35.annot.csv/HG-U133_Plus_2.na35_SHORT.annot.txt"
annotation<-read.table(annotation.file, header = TRUE, row.names=NULL, sep="\t", skip=0, stringsAsFactors=F, quote = "", comment.char="!", fill = TRUE, as.is = TRUE)
dim(annotation)
nrow(annotation)
annotation<-subset( annotation, subset=(Gene.Symbol !="---")) #if no gene symbol, discount
# Remove rows in which genes are noted to have negative strand matching probes
idxNegativeStrand<-grep("Negative Strand Matching Probes", annotation$Annotation.Notes)
if(length(idxNegativeStrand)>0)
{
annotation<-annotation[-idxNegativeStrand,]
}
expressionMatrix<-exprs(rmaEset)
colnames(expressionMatrix)
#this is for matched samples
Treat<-factor(rep(c("Control", "Patient"),c(183,5)), levels=c("Control", "Patient"))
design<-model.matrix(~Treat)
rownames(design)<-colnames(expressionMatrix)
design
#Conduct statistical analysis of expression
library(limma)
fit<-lmFit(expressionMatrix, design) #linear model fit
fit<-eBayes(fit)
result<-topTable(fit, coef="TreatPatient", adjust="BH", number=nrow(expressionMatrix)) #"BH" adjust for multiple hypothesis testing
#toptable normally takes top number but this takes all
setwd("/Users/clairegreen/Documents/PhD/Parkinsons/Parkinsons_Code/Results/GeneExpression")
write.csv(result, file = paste(analysis.name, "_result.csv", sep = ""))
result$"ProbeSetID"<-rownames(result) #make probeset IDs the row names
head(result$"ProbeSetID")
result$"Fold Change"<-2^result$logFC
result$"Fold Change"[result$"Fold Change"<1]<-(-1)/result$"Fold Change"[result$"Fold Change"<1] #converts log fold change into a linear value above or below 0
expressionLinear<-as.data.frame(2^expressionMatrix)
expressionLinear$ProbeSetID<-rownames(expressionLinear)
result<-merge(result, expressionLinear, by.x="ProbeSetID", by.y="ProbeSetID") #merge values into one array
result<-merge(annotation, result, by.x="Probe.Set.ID", by.y="ProbeSetID")
result<-merge(result, countPdf, by.x="Probe.Set.ID", by.y="ProbeSetID")
result$Gene.Symbol <- sapply(strsplit(result$Gene.Symbol,"///"), `[`, 1)
result$Ensembl <- sapply(strsplit(result$Ensembl,"///"), `[`, 1)
setwd(dir = "/Users/clairegreen/Documents/PhD/Parkinsons/Parkinsons_Code/Results/GeneExpression/")
write.csv(result, file=paste(analysis.name, "_finalresult.csv", sep=""), row.names=FALSE, quote = FALSE)
genesort <- result[order(result$P.Value),]
uniqueresult <- genesort[!duplicated(genesort[,5]),]
write.csv(uniqueresult, file=paste(analysis.name, "rankeduniqueresult.csv", sep=""), row.names=FALSE, quote = FALSE)
uniqueresult<-subset(uniqueresult, Gene.Symbol!="") #removes any probes for which there are no gene symbols
uniqueresult<-subset(uniqueresult, subset=(countP>20)) #only takes results that have at least 2 samples with a presence call for a probe
write.csv(uniqueresult, file=paste(analysis.name, "filteredresult.csv", sep=""), row.names=FALSE, quote = FALSE)
##### AMA PINK1 ##########
setwd("/Users/clairegreen/Documents/PhD/Parkinsons/Parkinsons_Data/GSE99039_keepfiles/")
#run program to choose .CEL files from directory
celfiles <- fileBrowser(textToShow = "Choose CEL files", testFun = hasSuffix("[cC][eE][lL]"))
Data<-ReadAffy(filenames=celfiles) #read in files
rmaEset<-rma(Data) #normalise using RMA
analysis.name<-"AMA_PINK1" #Label analysis
dataMatrixAll<-exprs(rmaEset) #takes expression from normalised expression set
#mas5call generates presence/absence calls for each probeset
mas5call<-mas5calls(Data)
callMatrixAll<-exprs(mas5call)
colnames(callMatrixAll)<-sub(".CEL", ".mas5-Detection", colnames(callMatrixAll),fixed=TRUE)
colnames(callMatrixAll)<-sub(".cel", ".mas5-Detection", colnames(callMatrixAll),fixed=TRUE)
callMatrixAll<-as.data.frame(callMatrixAll)
callMatrixAll$ProbeSetID<-rownames(callMatrixAll)
countPf<-function(x){
sum(x=="P")
}
#count how many samples have presence calls
countPl<-apply(callMatrixAll, 1, countPf)
callMatrixAll$ProbeSetID<-rownames(callMatrixAll)
countPdf<-data.frame(ProbeSetID=names(countPl), countP=countPl)
#read annotation
# USING ANNOTATION FILE (if .csv, convert to .txt using excel)
annotation.file<-"/Users/clairegreen/Documents/PhD/TDP-43/TDP-43_Data/HG-U133_Plus_2.na35.annot.csv/HG-U133_Plus_2.na35_SHORT.annot.txt"
annotation<-read.table(annotation.file, header = TRUE, row.names=NULL, sep="\t", skip=0, stringsAsFactors=F, quote = "", comment.char="!", fill = TRUE, as.is = TRUE)
dim(annotation)
nrow(annotation)
annotation<-subset( annotation, subset=(Gene.Symbol !="---")) #if no gene symbol, discount
# Remove rows in which genes are noted to have negative strand matching probes
idxNegativeStrand<-grep("Negative Strand Matching Probes", annotation$Annotation.Notes)
if(length(idxNegativeStrand)>0)
{
annotation<-annotation[-idxNegativeStrand,]
}
expressionMatrix<-exprs(rmaEset)
colnames(expressionMatrix)
#this is for matched samples
Treat<-factor(rep(c("Control", "Patient"),c(183,12)), levels=c("Control", "Patient"))
design<-model.matrix(~Treat)
rownames(design)<-colnames(expressionMatrix)
design
#Conduct statistical analysis of expression
library(limma)
fit<-lmFit(expressionMatrix, design) #linear model fit
fit<-eBayes(fit)
result<-topTable(fit, coef="TreatPatient", adjust="BH", number=nrow(expressionMatrix)) #"BH" adjust for multiple hypothesis testing
#toptable normally takes top number but this takes all
setwd("/Users/clairegreen/Documents/PhD/Parkinsons/Parkinsons_Code/Results/GeneExpression")
write.csv(result, file = paste(analysis.name, "_result.csv", sep = ""))
result$"ProbeSetID"<-rownames(result) #make probeset IDs the row names
head(result$"ProbeSetID")
result$"Fold Change"<-2^result$logFC
result$"Fold Change"[result$"Fold Change"<1]<-(-1)/result$"Fold Change"[result$"Fold Change"<1] #converts log fold change into a linear value above or below 0
expressionLinear<-as.data.frame(2^expressionMatrix)
expressionLinear$ProbeSetID<-rownames(expressionLinear)
result<-merge(result, expressionLinear, by.x="ProbeSetID", by.y="ProbeSetID") #merge values into one array
result<-merge(annotation, result, by.x="Probe.Set.ID", by.y="ProbeSetID")
result<-merge(result, countPdf, by.x="Probe.Set.ID", by.y="ProbeSetID")
result$Gene.Symbol <- sapply(strsplit(result$Gene.Symbol,"///"), `[`, 1)
result$Ensembl <- sapply(strsplit(result$Ensembl,"///"), `[`, 1)
setwd(dir = "/Users/clairegreen/Documents/PhD/Parkinsons/Parkinsons_Code/Results/GeneExpression/")
write.csv(result, file=paste(analysis.name, "_finalresult.csv", sep=""), row.names=FALSE, quote = FALSE)
genesort <- result[order(result$P.Value),]
uniqueresult <- genesort[!duplicated(genesort[,5]),]
write.csv(uniqueresult, file=paste(analysis.name, "rankeduniqueresult.csv", sep=""), row.names=FALSE, quote = FALSE)
uniqueresult<-subset(uniqueresult, Gene.Symbol!="") #removes any probes for which there are no gene symbols
uniqueresult<-subset(uniqueresult, subset=(countP>20)) #only takes results that have at least 2 samples with a presence call for a probe
write.csv(uniqueresult, file=paste(analysis.name, "filteredresult.csv", sep=""), row.names=FALSE, quote = FALSE)
##### AMA PARKIN ##########
setwd("/Users/clairegreen/Documents/PhD/Parkinsons/Parkinsons_Data/GSE99039_keepfiles/")
#run program to choose .CEL files from directory
celfiles <- fileBrowser(textToShow = "Choose CEL files", testFun = hasSuffix("[cC][eE][lL]"))
Data<-ReadAffy(filenames=celfiles) #read in files
rmaEset<-rma(Data) #normalise using RMA
analysis.name<-"AMA_PARKIN" #Label analysis
dataMatrixAll<-exprs(rmaEset) #takes expression from normalised expression set
#mas5call generates presence/absence calls for each probeset
mas5call<-mas5calls(Data)
callMatrixAll<-exprs(mas5call)
colnames(callMatrixAll)<-sub(".CEL", ".mas5-Detection", colnames(callMatrixAll),fixed=TRUE)
colnames(callMatrixAll)<-sub(".cel", ".mas5-Detection", colnames(callMatrixAll),fixed=TRUE)
callMatrixAll<-as.data.frame(callMatrixAll)
callMatrixAll$ProbeSetID<-rownames(callMatrixAll)
countPf<-function(x){
sum(x=="P")
}
#count how many samples have presence calls
countPl<-apply(callMatrixAll, 1, countPf)
callMatrixAll$ProbeSetID<-rownames(callMatrixAll)
countPdf<-data.frame(ProbeSetID=names(countPl), countP=countPl)
#read annotation
# USING ANNOTATION FILE (if .csv, convert to .txt using excel)
annotation.file<-"/Users/clairegreen/Documents/PhD/TDP-43/TDP-43_Data/HG-U133_Plus_2.na35.annot.csv/HG-U133_Plus_2.na35_SHORT.annot.txt"
annotation<-read.table(annotation.file, header = TRUE, row.names=NULL, sep="\t", skip=0, stringsAsFactors=F, quote = "", comment.char="!", fill = TRUE, as.is = TRUE)
dim(annotation)
nrow(annotation)
annotation<-subset( annotation, subset=(Gene.Symbol !="---")) #if no gene symbol, discount
# Remove rows in which genes are noted to have negative strand matching probes
idxNegativeStrand<-grep("Negative Strand Matching Probes", annotation$Annotation.Notes)
if(length(idxNegativeStrand)>0)
{
annotation<-annotation[-idxNegativeStrand,]
}
expressionMatrix<-exprs(rmaEset)
colnames(expressionMatrix)
#this is for matched samples
Treat<-factor(rep(c("Control", "Patient"),c(183,13)), levels=c("Control", "Patient"))
design<-model.matrix(~Treat)
rownames(design)<-colnames(expressionMatrix)
design
#Conduct statistical analysis of expression
library(limma)
fit<-lmFit(expressionMatrix, design) #linear model fit
fit<-eBayes(fit)
result<-topTable(fit, coef="TreatPatient", adjust="BH", number=nrow(expressionMatrix)) #"BH" adjust for multiple hypothesis testing
#toptable normally takes top number but this takes all
setwd("/Users/clairegreen/Documents/PhD/Parkinsons/Parkinsons_Code/Results/GeneExpression")
write.csv(result, file = paste(analysis.name, "_result.csv", sep = ""))
result$"ProbeSetID"<-rownames(result) #make probeset IDs the row names
head(result$"ProbeSetID")
result$"Fold Change"<-2^result$logFC
result$"Fold Change"[result$"Fold Change"<1]<-(-1)/result$"Fold Change"[result$"Fold Change"<1] #converts log fold change into a linear value above or below 0
expressionLinear<-as.data.frame(2^expressionMatrix)
expressionLinear$ProbeSetID<-rownames(expressionLinear)
result<-merge(result, expressionLinear, by.x="ProbeSetID", by.y="ProbeSetID") #merge values into one array
result<-merge(annotation, result, by.x="Probe.Set.ID", by.y="ProbeSetID")
result<-merge(result, countPdf, by.x="Probe.Set.ID", by.y="ProbeSetID")
result$Gene.Symbol <- sapply(strsplit(result$Gene.Symbol,"///"), `[`, 1)
result$Ensembl <- sapply(strsplit(result$Ensembl,"///"), `[`, 1)
setwd(dir = "/Users/clairegreen/Documents/PhD/Parkinsons/Parkinsons_Code/Results/GeneExpression/")
write.csv(result, file=paste(analysis.name, "_finalresult.csv", sep=""), row.names=FALSE, quote = FALSE)
genesort <- result[order(result$P.Value),]
uniqueresult <- genesort[!duplicated(genesort[,5]),]
write.csv(uniqueresult, file=paste(analysis.name, "rankeduniqueresult.csv", sep=""), row.names=FALSE, quote = FALSE)
uniqueresult<-subset(uniqueresult, Gene.Symbol!="") #removes any probes for which there are no gene symbols
uniqueresult<-subset(uniqueresult, subset=(countP>20)) #only takes results that have at least 2 samples with a presence call for a probe
write.csv(uniqueresult, file=paste(analysis.name, "filteredresult.csv", sep=""), row.names=FALSE, quote = FALSE)
#############################################################
################### FOLD CHANGE DEG #########################
#############################################################
setwd("/users/clairegreen/Documents/PhD/Parkinsons/Parkinsons_Code/Results/GeneExpression/")
LEW <- read.csv("LEWfilteredresult.csv")
LEW <- LEW[order(LEW$P.Value),]
MID3 <- read.csv("MID3filteredresult.csv")
MID3 <- MID3[order(MID3$P.Value),]
MID4 <- read.csv("MID4filteredresult.csv")
MID4 <- MID4[order(MID4$P.Value),]
MOR.FC <- read.csv("MOR.FCfilteredresult.csv")
MOR.FC <- MOR.FC[order(MOR.FC$P.Value),]
DIJ <- read.csv("DIJfilteredresult.csv")
DIJ <- DIJ[order(DIJ$P.Value),]
FFR <- read.csv("FFRfilteredresult.csv")
FFR <- FFR[order(FFR$P.Value),]
MID1 <- read.csv("MID1filteredresult.csv")
MID1 <- MID1[order(MID1$P.Value),]
MID2 <- read.csv("MID2filteredresult.csv")
MID2 <- MID2[order(MID2$P.Value),]
MOR.SN <- read.csv("MOR.SNfilteredresult.csv")
MOR.SN <- MOR.SN[order(MOR.SN$P.Value),]
DUM <- read.csv("DUM_UniqueGene_DESeq2.csv")
DUM <- DUM[order(DUM$pvalue),]
BOT <- read.csv("BOTrankeduniqueresult.csv")
BOT <- BOT[order(BOT$P.Value),]
BOT2 <- read.csv("BOT2rankeduniqueresult.csv")
BOT2 <- BOT2[order(BOT2$P.Value),]
thresh <- 1
upLEW <- subset(LEW, LEW$Fold.Change >= thresh)
upLEWgene <- upLEW$Gene.Symbol
upMID3<- subset(MID3, MID3$Fold.Change >= thresh)
upMID3gene <- upMID3$Gene.Symbol
upMID4 <- subset(MID4, MID4$Fold.Change >= thresh)
upMID4gene <- upMID4$Gene.Symbol
upMOR.FC <- subset(MOR.FC, MOR.FC$Fold.Change >= thresh)
upMOR.FCgene <- upMOR.FC$Gene.Symbol
upDUM <- subset(DUM, DUM$log2FoldChange >= 0)
upDUMgene <- upDUM$hgnc_symbol
upDIJ <- subset(DIJ, DIJ$Fold.Change >= thresh)
upDIJgene <- upDIJ$Gene.Symbol
upFFR<- subset(FFR, FFR$Fold.Change >= thresh)
upFFRgene <- upFFR$Gene.Symbol
upMID1<- subset(MID1, MID1$Fold.Change >= thresh)
upMID1gene <- upMID1$Gene.Symbol
upMID2 <- subset(MID2, MID2$Fold.Change >= thresh)
upMID2gene <- upMID2$Gene.Symbol
upMOR.SN <- subset(MOR.SN, MOR.SN$Fold.Change >= thresh)
upMOR.SNgene <- upMOR.SN$Gene.Symbol
upBOT <- subset(BOT, BOT$Fold.Change >= thresh)
upBOTgene <- upBOT$Gene.Symbol
upBOT2 <- subset(BOT2, BOT2$Fold.Change >= thresh)
upBOT2gene <- upBOT2$Gene.Symbol
INTUP <- Reduce(intersect, list(upLEWgene, upMID3gene, upMID4gene, upMOR.FCgene,
upDIJgene, upFFRgene, upMID1gene, upMID2gene, upMOR.SNgene, upDUMgene, upBOTgene, upBOT2gene))
#### DOWN ####
thresh <- -1
downLEW <- subset(LEW, LEW$Fold.Change <= thresh)
downLEWgene <- downLEW$Gene.Symbol
downMID3 <- subset(MID3, MID3$Fold.Change <= thresh)
downMID3gene <- downMID3$Gene.Symbol
downMID4 <- subset(MID4, MID4$Fold.Change <= thresh)
downMID4gene <- downMID4$Gene.Symbol
downMOR.FC <- subset(MOR.FC, MOR.FC$Fold.Change <= thresh)
downMOR.FCgene <- downMOR.FC$Gene.Symbol
downDUM <- subset(DUM, DUM$log2FoldChange <= 0)
downDUMgene <- downDUM$hgnc_symbol
downDIJ <- subset(DIJ, DIJ$Fold.Change <= thresh)
downDIJgene <- downDIJ$Gene.Symbol
downFFR<- subset(FFR, FFR$Fold.Change <= thresh)
downFFRgene <- downFFR$Gene.Symbol
downMID1<- subset(MID1, MID1$Fold.Change <= thresh)
downMID1gene <- downMID1$Gene.Symbol
downMID2 <- subset(MID2, MID2$Fold.Change <= thresh)
downMID2gene <- downMID2$Gene.Symbol
downMOR.SN <- subset(MOR.SN, MOR.SN$Fold.Change <= thresh)
downMOR.SNgene <- downMOR.SN$Gene.Symbol
downBOT <- subset(BOT, BOT$Fold.Change <= thresh)
downBOTgene <- downBOT$Gene.Symbol
downBOT2 <- subset(BOT2, BOT2$Fold.Change <= thresh)
downBOT2gene <- downBOT2$Gene.Symbol
INTDOWN <- Reduce(intersect, list(downLEWgene, downMID3gene, downMID4gene, downMOR.FCgene,
downDIJgene, downFFRgene, downMID1gene, downMID2gene,
downMOR.SNgene, downDUMgene, downBOTgene, downBOT2gene))
########################### COMMON GENES ##############################
all <- c(INTUP, INTDOWN)
setwd("/Users/clairegreen/Documents/PhD/Parkinsons/Parkinsons_Code/Results/GeneExpression/FoldChange")
# write.table(INTUP,"Sp_LRRK2_upDEGs.txt", row.names = F, col.names = F, quote = F)
# write.table(INTDOWN,"Sp_LRRK2_downDEGs.txt", row.names = F, col.names = F, quote = F)
# write.table(all, "Sp_LRRK2_allDEGs.txt", row.names = F, col.names = F, quote = F)
PDgenes <- readLines("/Users/clairegreen/Documents/PhD/Parkinsons/ParkinsonsDiseaseMalacards.txt")
intersect(INTUP, PDgenes)
intersect(INTDOWN, PDgenes)
####### ALS signature ##########
setwd("/users/clairegreen/Documents/PhD/TDP-43/TDP-43_Code/Results/GeneExpression/noMedian/")
C9 <- read.csv("C9_unique.csv")
C9 <- C9[order(C9$P.Value),]
sals <- read.csv("sals_unique.csv")
sals <- sals[order(sals$P.Value),]
setwd("/users/clairegreen/Documents/PhD/TDP-43/TDP-43_Code/Results/GeneExpression/TDP-43_DEseq2/")
pet <- read.csv("PET_results_keepfiltering.csv")
rav <- read.csv("RAV_results_keepfiltering.csv")
setwd("/users/clairegreen/Documents/PhD/TDP-43/TDP-43_Code/Results/GeneExpression/non-TDP/")
FUS <- read.csv("FUSrankeduniqueresult.csv")
FUS <- FUS[order(FUS$P.Value),]
SOD1 <- read.csv("SOD1rankeduniqueresult.csv")
SOD1 <- SOD1[order(SOD1$P.Value),]
thresh <- 1
upC9 <- subset(C9, C9$Fold.Change >= thresh)
upC9gene <- upC9$Gene.Symbol
upSALS <- subset(sals, sals$Fold.Change >= thresh)
upSALSgene <- upSALS$Gene.Symbol
upPET <- subset(pet, pet$FoldChange >= thresh)
upPETgene <- upPET$hgnc_symbol
upRAV <- subset(rav, rav$FoldChange >= thresh)
upRAVgene <- upRAV$hgnc_symbol
upFUS <- subset(FUS, FUS$Fold.Change >= thresh)
upFUSgene <- upFUS$Gene.Symbol
upSOD1 <- subset(SOD1, SOD1$Fold.Change >= thresh)
upSOD1gene <- upSOD1$Gene.Symbol
INTUP_ALS <- Reduce(intersect, list(upC9gene, upSALSgene, upPETgene, upRAVgene, upFUSgene, upSOD1gene))
#### DOWN
thresh <- -1
downC9 <- subset(C9, C9$Fold.Change <= thresh)
downC9gene <- downC9$Gene.Symbol
downSALS <- subset(sals, sals$Fold.Change <= thresh)
downSALSgene <- downSALS$Gene.Symbol
downPET <- subset(pet, pet$FoldChange <= thresh)
downPETgene <- downPET$hgnc_symbol
downRAV <- subset(rav, rav$FoldChange <= thresh)
downRAVgene <- downRAV$hgnc_symbol
downFUS <- subset(FUS, FUS$Fold.Change <= thresh)
downFUSgene <- downFUS$Gene.Symbol
downSOD1 <- subset(SOD1, SOD1$Fold.Change <= thresh)
downSOD1gene <- downSOD1$Gene.Symbol
INTDOWN_ALS <- Reduce(intersect, list(downC9gene, downSALSgene, downPETgene, downRAVgene, downFUSgene, downSOD1gene))
##### COMMON GENES ###
upremove <- Reduce(intersect, list (INTUP, INTUP_ALS))
downremove <- Reduce(intersect, list(INTDOWN, INTDOWN_ALS))
###### REMOVE COMMON GENES ###
resultsup <- subset(INTUP, !(INTUP %in% upremove))
resultsdown <- subset(INTDOWN, !(INTDOWN %in% downremove))
results <- c(resultsup, resultsdown)
####### sPD Blood signature ##########
setwd("/users/clairegreen/Documents/PhD/Parkinsons/Parkinsons_Code/Results/GeneExpression/")
AMA <- read.csv("AMAfilteredresult.csv")
AMA <- AMA[order(AMA$P.Value),]
RON <- read.csv("RONfilteredresult.csv")
RON <- RON[order(RON$P.Value),]
thresh <- 1
upAMA <- subset(AMA, AMA$Fold.Change >= thresh)
upAMAgene <- upAMA$Gene.Symbol
upRON <- subset(RON, RON$Fold.Change >= thresh)
upRONgene <- upRON$Gene.Symbol
INTUP_blood <- Reduce(intersect, list(upAMAgene, upRONgene))
#### DOWN ###
thresh <- -1
downAMA <- subset(AMA, AMA$Fold.Change <= thresh)
downAMAgene <- downAMA$Gene.Symbol
downRON <- subset(RON, RON$Fold.Change <= thresh)
downRONgene <- downRON$Gene.Symbol
INTDOWN_blood <- Reduce(intersect, list(downAMAgene, downRONgene))
##### COMMON GENES ###
upremove2 <- Reduce(intersect, list(INTUP, INTUP_blood))
downremove2 <- Reduce(intersect, list(INTDOWN, INTDOWN_blood))
##### REMOVE COMMON GENES ###
resultsup <- subset(resultsup, !(resultsup %in% upremove2))
resultsdown <- subset(resultsdown, !(resultsdown %in% downremove2))
results <- c(resultsup, resultsdown)
setwd("/Users/clairegreen/Documents/PhD/Parkinsons/Parkinsons_Code/Results/GeneExpression/FoldChange/")
# write.table(resultsup, "Sp_LRRK2_FiltALSPDblood_UPgenes.txt", quote = F, row.names = F, col.names = F)
# write.table(resultsdown, "Sp_LRRK2_FiltALSPDblood_DOWNgenes.txt", quote = F, row.names = F, col.names = F)
# write.table(results, "Sp_LRRK2_FiltALSPDblood_ALLgenes.txt", quote = F, row.names = F, col.names = F)
# cat(resultsup, sep="\n")
intersect(resultsup, PDgenes)
intersect(resultsdown, PDgenes)
####### fPD Blood signature ##########
setwd("/users/clairegreen/Documents/PhD/Parkinsons/Parkinsons_Code/Results/GeneExpression/")
AMA_A <- read.csv("AMA_ATP13A2filteredresult.csv")
AMA_A <- AMA_A[order(AMA_A$P.Value),]
AMA_PARK <- read.csv("AMA_PARKINfilteredresult.csv")
AMA_PARK <- AMA_PARK[order(AMA_PARK$P.Value),]
AMA_PINK <- read.csv("AMA_PINK1filteredresult.csv")
AMA_PINK <- AMA_PINK[order(AMA_PINK$P.Value),]
thresh <- 1
upAMA_A <- subset(AMA_A, AMA_A$Fold.Change >= thresh)
upAMA_Agene <- AMA_A$Gene.Symbol
upAMA_PARK<- subset(AMA_PARK, AMA_PARK$Fold.Change >= thresh)
upAMA_PARKgene <- upAMA_PARK$Gene.Symbol
upAMA_PINK<- subset(AMA_PINK, AMA_PINK$Fold.Change >= thresh)
upAMA_PINKgene <- upAMA_PINK$Gene.Symbol
INTUP_fam_blood <- Reduce(intersect, list(upAMA_Agene, upAMA_PARKgene, upAMA_PINKgene))
#### DOWN ###
thresh <- -1
downAMA_A <- subset(AMA_A, AMA_A$Fold.Change <= thresh)
downAMA_Agene <- AMA_A$Gene.Symbol
downAMA_PARK<- subset(AMA_PARK, AMA_PARK$Fold.Change <= thresh)
downAMA_PARKgene <- downAMA_PARK$Gene.Symbol
downAMA_PINK<- subset(AMA_PINK, AMA_PINK$Fold.Change <= thresh)
downAMA_PINKgene <- downAMA_PINK$Gene.Symbol
INTDOWN_fam_blood <- Reduce(intersect, list(downAMA_Agene, downAMA_PARKgene, downAMA_PINKgene))
##### COMMON GENES ###
upremove3 <- Reduce(intersect, list(INTUP, INTUP_fam_blood))
downremove3 <- Reduce(intersect, list(INTDOWN, INTDOWN_fam_blood))
##### REMOVE COMMON GENES ###
resultsup <- subset(resultsup, !(resultsup %in% upremove3))
resultsdown <- subset(resultsdown, !(resultsdown %in% downremove3))
results <- c(resultsup, resultsdown)
setwd("/Users/clairegreen/Documents/PhD/Parkinsons/Parkinsons_Code/Results/FamilialBlood/")
write.table(resultsup, "ALS_sfblood_UPgenes.txt", quote = F, row.names = F, col.names = F)
write.table(resultsdown, "ALS_sfblood_DOWNgenes.txt", quote = F, row.names = F, col.names = F)
write.table(results, "ALS_sfblood_ALLgenes.txt", quote = F, row.names = F, col.names = F)
cat(resultsup, sep="\n")
intersect(resultsup, PDgenes)
intersect(resultsdown, PDgenes)
This produced 170 commonly differentially expressed genes (ALS_sfblood_ALLgenes.txt). These genes included Malacards SCNA, MSX1, UCHL1, ALDH1A1 .
The 170 gene seed created a network of 2159 proteins (/familialblood/PPIgenes). EnrichR (http://amp.pharm.mssm.edu/Enrichr/enrich?dataset=3ztlg) shows high enrichment for a number of processes, making it hard to know exactly what the geneset represents. However for GO Cellular component the highest enrichments are still for the mitochondria
In this case, I am somewhat confident that I am looking in the right area. The only way we will know if it is, however, is to look at whether there is common coexpression
#### Parkinson's DEG PPI Correlation ####
#Read in network nodes
#### Familial Blood ####
#Read in network nodes
DEG_PPI <- readLines("/Users/clairegreen/Documents/PhD/Parkinsons/Parkinsons_Code/Results/FamilialBlood/PPIGenes.txt")
setwd("/users/clairegreen/Documents/PhD/Parkinsons/Parkinsons_Code/Results/GeneExpression/")
#Extract PPI network genes from each dataset#####
LEW <- read.csv("LEWfilteredresult.csv")
rownames(LEW) <- LEW$Gene.Symbol
LEW <- LEW[,28:41]
LEW <- subset(LEW, rownames(LEW) %in% DEG_PPI)
MID3 <- read.csv("MID3filteredresult.csv")
rownames(MID3) <- MID3$Gene.Symbol
MID3 <- MID3[,36:49]
MID3 <- subset(MID3, rownames(MID3) %in% DEG_PPI)
MID4 <- read.csv("MID4filteredresult.csv")
rownames(MID4) <- MID4$Gene.Symbol
MID4 <- MID4[,41:55]
MID4 <- subset(MID4, rownames(MID4) %in% DEG_PPI)
MOR.FC <- read.csv("MOR.FCfilteredresult.csv")
rownames(MOR.FC) <- MOR.FC$Gene.Symbol
MOR.FC <- MOR.FC[,24:28]
MOR.FC <- subset(MOR.FC, rownames(MOR.FC) %in% DEG_PPI)
DIJ <- read.csv("DIJfilteredresult.csv")
rownames(DIJ) <- DIJ$Gene.Symbol
DIJ <- DIJ[,29:43]
DIJ <- subset(DIJ, rownames(DIJ) %in% DEG_PPI)
FFR <- read.csv("FFRfilteredresult.csv")
rownames(FFR) <- FFR$Gene.Symbol
FFR <- FFR[,29:44]
FFR <- subset(FFR, rownames(FFR) %in% DEG_PPI)
MID1 <- read.csv("MID1filteredresult.csv")
rownames(MID1) <- MID1$Gene.Symbol
MID1 <- MID1[,28:37]
MID1 <- subset(MID1, rownames(MID1) %in% DEG_PPI)
MID2 <- read.csv("MID2filteredresult.csv")
rownames(MID2) <- MID2$Gene.Symbol
MID2 <- MID2[,39:49]
MID2 <- subset(MID2, rownames(MID2) %in% DEG_PPI)
MOR.SN <- read.csv("MOR.SNfilteredresult.csv")
rownames(MOR.SN) <- MOR.SN$Gene.Symbol
MOR.SN <- MOR.SN[,36:59]
MOR.SN <- subset(MOR.SN, rownames(MOR.SN) %in% DEG_PPI)
DUM <- read.csv("DUM_UniqueGene_DESeq2.csv")
rownames(DUM) <- DUM$hgnc_symbol
DUM <- DUM[,53:81]
DUM <- subset(DUM, rownames(DUM) %in% DEG_PPI)
#Find the gene names that all datasets have in common
DEG_com <- Reduce(intersect, list(rownames(DIJ), rownames(FFR),
rownames(LEW),rownames(MID1),rownames(MID2),
rownames(MID3),rownames(MID4),rownames(MOR.FC),
rownames(MOR.SN), rownames(DUM)))
#Subset each dataset with these common names so they are all the same size
DIJ <- subset(DIJ, rownames(DIJ) %in% DEG_com)
DUM <- subset(DUM, rownames(DUM) %in% DEG_com)
FFR <- subset(FFR, rownames(FFR) %in% DEG_com)
LEW <- subset(LEW, rownames(LEW) %in% DEG_com)
MID1 <- subset(MID1, rownames(MID1) %in% DEG_com)
MID2 <- subset(MID2, rownames(MID2) %in% DEG_com)
MID3 <- subset(MID3, rownames(MID3) %in% DEG_com)
MID4 <- subset(MID4, rownames(MID4) %in% DEG_com)
MOR.FC <- subset(MOR.FC, rownames(MOR.FC) %in% DEG_com)
MOR.SN <- subset(MOR.SN, rownames(MOR.SN) %in% DEG_com)
setwd("/users/clairegreen/Documents/PhD/Parkinsons/Parkinsons_Code/Results/FamilialBlood")
write.csv(DIJ, "DIJ_DEGfilt.csv")
write.csv(DUM, "DUM_DEGfilt.csv")
write.csv(FFR, "FFR_DEGfilt.csv")
write.csv(LEW, "LEW_DEGfilt.csv")
write.csv(MID1, "MID1_DEGfilt.csv")
write.csv(MID2, "MID2_DEGfilt.csv")
write.csv(MID3, "MID3_DEGfilt.csv")
write.csv(MID4, "MID4_DEGfilt.csv")
write.csv(MOR.FC, "MOR.FC_DEGfilt.csv")
write.csv(MOR.SN, "MOR.SN_DEGfilt.csv")
#Run Correlation analysis on Sharc
#### filter correlations #### BOT/BOT2 not included due to less than 4 samples
setwd("/Users/clairegreen/Documents/PhD/Parkinsons/Parkinsons_Code/Results/FamilialBlood/")
DIJ <- read.csv("DIJcorresult.csv")
DIJ$Gene1 <- as.character(lapply(strsplit(as.character(DIJ$X), "\\:"), "[", 2))
DIJ$Gene2 <- as.character(lapply(strsplit(as.character(DIJ$X), "\\:"), "[", 1))
DIJ <- DIJ[,c(5,6,1,2,3,4)]
DUM <- read.csv("DUMcorresult.csv")
DUM$Gene1 <- as.character(lapply(strsplit(as.character(DUM$X), "\\:"), "[", 2))
DUM$Gene2 <- as.character(lapply(strsplit(as.character(DUM$X), "\\:"), "[", 1))
DUM <- DUM[,c(5,6,1,2,3,4)]
DUM$X <- paste(DUM$Gene1,":",DUM$Gene2, sep = "")
FFR <- read.csv("FFRcorresult.csv")
FFR$Gene1 <- as.character(lapply(strsplit(as.character(FFR$X), "\\:"), "[", 2))
FFR$Gene2 <- as.character(lapply(strsplit(as.character(FFR$X), "\\:"), "[", 1))
FFR <- FFR[,c(5,6,1,2,3,4)]
FFR$X <- paste(FFR$Gene1,":",FFR$Gene2, sep = "")
LEW <- read.csv("LEWcorresult.csv")
LEW$Gene1 <- as.character(lapply(strsplit(as.character(LEW$X), "\\:"), "[", 2))
LEW$Gene2 <- as.character(lapply(strsplit(as.character(LEW$X), "\\:"), "[", 1))
LEW <- LEW[,c(5,6,1,2,3,4)]
MID1 <- read.csv("MID1corresult.csv")
MID1$Gene1 <- as.character(lapply(strsplit(as.character(MID1$X), "\\:"), "[", 2))
MID1$Gene2 <- as.character(lapply(strsplit(as.character(MID1$X), "\\:"), "[", 1))
MID1 <- MID1[,c(5,6,1,2,3,4)]
MID1$X <- paste(MID1$Gene1,":",MID1$Gene2, sep = "")
MID2 <- read.csv("MID2corresult.csv")
MID2$Gene1 <- as.character(lapply(strsplit(as.character(MID2$X), "\\:"), "[", 2))
MID2$Gene2 <- as.character(lapply(strsplit(as.character(MID2$X), "\\:"), "[", 1))
MID2 <- MID2[,c(5,6,1,2,3,4)]
MID3 <- read.csv("MID3corresult.csv")
MID3$Gene1 <- as.character(lapply(strsplit(as.character(MID3$X), "\\:"), "[", 2))
MID3$Gene2 <- as.character(lapply(strsplit(as.character(MID3$X), "\\:"), "[", 1))
MID3 <- MID3[,c(5,6,1,2,3,4)]
MID4 <- read.csv("MID4corresult.csv")
MID4$Gene1 <- as.character(lapply(strsplit(as.character(MID4$X), "\\:"), "[", 2))
MID4$Gene2 <- as.character(lapply(strsplit(as.character(MID4$X), "\\:"), "[", 1))
MID4 <- MID4[,c(5,6,1,2,3,4)]
MID4$X <- paste(MID4$Gene1,":",MID4$Gene2, sep = "")
MOR.FC<- read.csv("MOR.FCcorresult.csv")
MOR.FC$Gene1 <- as.character(lapply(strsplit(as.character(MOR.FC$X), "\\:"), "[", 2))
MOR.FC$Gene2 <- as.character(lapply(strsplit(as.character(MOR.FC$X), "\\:"), "[", 1))
MOR.FC <- MOR.FC[,c(5,6,1,2,3,4)]
MOR.SN<- read.csv("MOR.SNcorresult.csv")
MOR.SN$Gene1 <- as.character(lapply(strsplit(as.character(MOR.SN$X), "\\:"), "[", 2))
MOR.SN$Gene2 <- as.character(lapply(strsplit(as.character(MOR.SN$X), "\\:"), "[", 1))
MOR.SN <- MOR.SN[,c(5,6,1,2,3,4)]
thresh <- 0.1
### Filter by r value
DIJ_cor.5 <- DIJ[DIJ$reg.mat > thresh | DIJ$reg.mat < -thresh,]
# DUM_cor.5 <- DUM[DUM$reg.mat > thresh | DUM$reg.mat < -thresh,]
FFR_cor.5 <- FFR[FFR$reg.mat > thresh | FFR$reg.mat < -thresh,]
LEW_cor.5 <- LEW[LEW$reg.mat > thresh | LEW$reg.mat < -thresh,]
MID1_cor.5 <- MID1[MID1$reg.mat > thresh | MID1$reg.mat < -thresh,]
MID2_cor.5 <- MID2[MID2$reg.mat > thresh | MID2$reg.mat < -thresh,]
MID3_cor.5 <- MID3[MID3$reg.mat > thresh | MID3$reg.mat < -thresh,]
MID4_cor.5 <- MID4[MID4$reg.mat > thresh | MID4$reg.mat < -thresh,]
# MOR.FC_cor.5 <- MOR.FC[MOR.FC$reg.mat > thresh | MOR.FC$reg.mat < -thresh,]
MOR.SN_cor.5 <- MOR.SN[MOR.SN$reg.mat > thresh | MOR.SN$reg.mat < -thresh,]
### Find matches
Commonedge <- Reduce(intersect, list(DIJ_cor.5$X,FFR_cor.5$X,
LEW_cor.5$X, MID1_cor.5$X, MID2_cor.5$X,
MID3_cor.5$X, MID4_cor.5$X,
MOR.SN_cor.5$X))
#Subset each dataset with these common names so they are all the same size
DIJ_CE <- subset(DIJ_cor.5, DIJ_cor.5$X %in% Commonedge)
DIJ_CE <- DIJ_CE[order(DIJ_CE$X),]
# DUM_CE <- subset(DUM_cor.5, DUM_cor.5$X %in% Commonedge)
# DUM_CE <- DUM_CE[order(DUM_CE$X),]
FFR_CE <- subset(FFR_cor.5, FFR_cor.5$X %in% Commonedge)
FFR_CE <- FFR_CE[order(FFR_CE$X),]
LEW_CE <- subset(LEW_cor.5, LEW_cor.5$X %in% Commonedge)
LEW_CE <- LEW_CE[order(LEW_CE$X),]
MID1_CE <- subset(MID1_cor.5, MID1_cor.5$X %in% Commonedge)
MID1_CE <- MID1_CE[order(MID1_CE$X),]
MID2_CE <- subset(MID2_cor.5, MID2_cor.5$X %in% Commonedge)
MID2_CE <- MID2_CE[order(MID2_CE$X),]
MID3_CE <- subset(MID3_cor.5, MID3_cor.5$X %in% Commonedge)
MID3_CE <- MID3_CE[order(MID3_CE$X),]
MID4_CE <- subset(MID4_cor.5, MID4_cor.5$X %in% Commonedge)
MID4_CE <- MID4_CE[order(MID4_CE$X),]
# MOR.FC_CE <- subset(MOR.FC_cor.5, MOR.FC_cor.5$X %in% Commonedge)
# MOR.FC_CE <- MOR.FC_CE[order(MOR.FC_CE$X),]
MOR.SN_CE <- subset(MOR.SN_cor.5, MOR.SN_cor.5$X %in% Commonedge)
MOR.SN_CE <- MOR.SN_CE[order(MOR.SN_CE$X),]
CommonGroup <- data.frame(row.names = DIJ_CE$X,
DIJ = DIJ_CE$reg.mat,
FFR = FFR_CE$reg.mat,
LEW = LEW_CE$reg.mat,
MID1 = MID1_CE$reg.mat,
MID2 = MID2_CE$reg.mat,
MID3 = MID3_CE$reg.mat,
MID4 = MID4_CE$reg.mat,
MOR.SN = MOR.SN_CE$reg.mat)
CG_conserved_up <- CommonGroup[apply(CommonGroup, MARGIN = 1, function(x) all(x > 0)), ]
CG_conserved_down <- CommonGroup[apply(CommonGroup, MARGIN = 1, function(x) all(x < 0)), ]
setwd("/Users/clairegreen/Documents/PhD/Parkinsons/Parkinsons_Code/Results/FamilialBlood/")
CG_samedir <- rbind(CG_conserved_up, CG_conserved_down)
CG_samedir$corMean <- rowMeans(CG_samedir, na.rm = FALSE, dims = 1)
CG_samedir$Gene <- rownames(CG_samedir)
CG_samedir$Gene1 <- as.character(lapply(strsplit(as.character(CG_samedir$Gene), "\\:"), "[", 2))
CG_samedir$Gene2 <- as.character(lapply(strsplit(as.character(CG_samedir$Gene), "\\:"), "[", 1))
write.csv(CG_samedir, "PD_samedir_1.csv", quote = F, row.names = F)
nodetable <- read.csv("PD_samedir_1node.csv")
nuggenes <- as.character(nodetable$name)
PDgenes <- readLines("/Users/clairegreen/Documents/PhD/Parkinsons/ParkinsonsDiseaseMalacards.txt")
celltype <- read.csv("~/Documents/PhD/TDP-43/TDP-43_Code/Results/PPI_Network/Zhang_BrainCelltype_Markers_braingenes.csv")
DEG <- readLines("ALS_sfblood_ALLgenes.txt")
nalls <- readLines("/Users/clairegreen/Documents/PhD/Parkinsons/NallsPDGWAS.txt")
targetvalPD <- c("SIRT2", "DNM1L", "STMN1", "DNAJA3", "TBK1", "RTCA", "ANXA7", "DNAJC12", "RTN4", "ADSL", "MDH1","ATP6V1G2",
"YWHAZ", "ETS1")
nodetable$PDMalacards <- nodetable$shared.name %in% PDgenes
nodetable$DEG <- nodetable$name %in% DEG
nodetable$targetvalPD <- nodetable$shared.name %in% targetvalPD
nodetable$targetvalPD <- nodetable$shared.name %in% nalls
nodetable_celltype <- merge(celltype, nodetable, by.x = "Gene.symbol", by.y = "shared.name", all = T)
nodetable_celltype <- subset(nodetable_celltype, !(nodetable_celltype$SUID == "NA"))
nodetable_celltype$targetvalPD <- nodetable_celltype$Gene.symbol %in% targetvalPD
write.csv(nodetable_celltype, "ModifiedPDNodetable.csv", row.names = F)
This analysis didn’t go as well as the TDP-43 case. The correlations were much more variable meaning I had to reduce the thresholds from 0.5 to 0.1. Additionally, I removed four datasets - the LRRK2 datasets couldn’t be used because they only had 2 and 3 samples - correlation would never calculated properly. DUM was an RNA-seq dataset so I was losing annotation coverage, and MOR.FC showed a really odd rho value distribution…
Looking back this is probably because it only has 3 control samples.
Using these 8 datasets, with a threshold of 0.1, This generated a network containing 279 nodes and 292 edges. Within this network was one clear largest connected component of 208 nodes and 249 edges.
If I increase the threshold to 0.2, the resulting network is much smaller:
http://amp.pharm.mssm.edu/Enrichr/enrich?dataset=3zurx
The enrichments suggest that there is an immune process signal. There is also still an enrichment in the mitochondria for GO Cellular Component.
I realised after looking more closely at the paper that the LEW samples had some olivary nucleus samples which were supposed to be unaffected controls. Removing these samples meant I needed to essentially run everything again (again).
Rerunning left 210 genes this time (ALS_sfblood_ALLgenes.txt).
> intersect(resultsup, PDgenes)
[1] "MSX1" "SPR"
> intersect(resultsdown, PDgenes)
[1] "SNCA" "ALDH1A1" "UCHL1" "CYCS"
Functional enrichment in EnrichR for upregulated and downregulated genes. Upregulated signal isn’t entirely clear; a number of signalling and DNA-binding pathways are enriched but not with the strongest p values. Alternatively there is quite a strong signal in the downregulated genes. A number of neurodegenerative pathways (inc Parkinson’s disease) are enriched, as well as the electron transport chain/mitochondrial processes (mitochondrion (GO:0005739) padj = 4.711e-8, 23 genes). GO Biological Process also implicates autophagy, which highlights the presence of CHMP2B - a causative ALS/FTLD gene.
| Symbol | Description | Category | GIFTs | Matched Phenotypes | Matched Phenotypes Scount | Global Rank (Total Genes 11967) | Score |
|---|---|---|---|---|---|---|---|
| SNCA | Synuclein Alpha | Protein Coding | 61 | Parkinson’s Disease, LRRK2, late-onset, neurodegeneration | 4 | 2 | 378.27 |
| UCHL1 | Ubiquitin C-Terminal Hydrolase L1 | Protein Coding | 62 | Parkinson’s Disease, LRRK2, late-onset, neurodegeneration | 4 | 7 | 209.64 |
| CYCS | Cytochrome C, Somatic | Protein Coding | 57 | Parkinson’s Disease, LRRK2, late-onset, neurodegeneration | 4 | 63 | 63.23 |
| NDUFV2 | NADH:Ubiquinone Oxidoreductase Core Subunit V2 | Protein Coding | 53 | Parkinson’s Disease, LRRK2, late-onset, neurodegeneration | 4 | 106 | 51.16 |
| ALDH1A1 | Aldehyde Dehydrogenase 1 Family Member A1 | Protein Coding | 57 | Parkinson’s Disease, LRRK2, late-onset, neurodegeneration | 4 | 153 | 42.23 |
| SPR | Sepiapterin Reductase | Protein Coding | 58 | Parkinson’s Disease, LRRK2, late-onset | 3 | 126 | 40.38 |
| MSX1 | Msh Homeobox 1 | Protein Coding | 55 | Parkinson’s Disease, LRRK2, late-onset | 3 | 169 | 33.61 |
| NDUFS1 | NADH:Ubiquinone Oxidoreductase Core Subunit S1 | Protein Coding | 56 | Parkinson’s Disease, LRRK2, late-onset, neurodegeneration | 4 | 213 | 28.87 |
| NDUFS3 | NADH:Ubiquinone Oxidoreductase Core Subunit S3 | Protein Coding | 56 | Parkinson’s Disease, LRRK2, late-onset, neurodegeneration | 4 | 236 | 26.54 |
| SYT11 | Synaptotagmin 11 | Protein Coding | 50 | Parkinson’s Disease, LRRK2, neurodegeneration | 3 | 239 | 22.93 |
| CASP7 | Caspase 7 | Protein Coding | 59 | Parkinson’s Disease, LRRK2, late-onset, neurodegeneration | 4 | 309 | 21.45 |
| TOMM20 | Translocase Of Outer Mitochondrial Membrane 20 | Protein Coding | 48 | Parkinson’s Disease, LRRK2, neurodegeneration | 3 | 362 | 16.56 |
| MFN2 | Mitofusin 2 | Protein Coding | 59 | Parkinson’s Disease, LRRK2, late-onset, neurodegeneration | 4 | 424 | 16.55 |
| IGF1R | Insulin Like Growth Factor 1 Receptor | Protein Coding | 66 | Parkinson’s Disease, LRRK2, late-onset, neurodegeneration | 4 | 435 | 16.27 |
| MAP1B | Microtubule Associated Protein 1B | Protein Coding | 51 | Parkinson’s Disease, LRRK2, late-onset, neurodegeneration | 4 | 474 | 15.18 |
| TMEM126B | Transmembrane Protein 126B | Protein Coding | 47 | Parkinson’s Disease, LRRK2 | 2 | 363 | 13.45 |
| PTS | 6-Pyruvoyltetrahydropterin Synthase | Protein Coding | 57 | Parkinson’s Disease, late-onset, neurodegeneration | 3 | 556 | 11.88 |
| CASP6 | Caspase 6 | Protein Coding | 57 | Parkinson’s Disease, LRRK2, neurodegeneration | 3 | 573 | 11.74 |
| RAB7A | RAB7A, Member RAS Oncogene Family | Protein Coding | 58 | Parkinson’s Disease, LRRK2, late-onset, neurodegeneration | 4 | 759 | 11.00 |
| PPP2R2B | Protein Phosphatase 2 Regulatory Subunit Bbeta | Protein Coding | 55 | Parkinson’s Disease, LRRK2, late-onset, neurodegeneration | 4 | 788 | 10.69 |
| PYROXD1 | Pyridine Nucleotide-Disulphide Oxidoreductase Domain 1 | Protein Coding | 44 | Parkinson’s Disease, late-onset | 2 | 487 | 10.60 |
| AGRP | Agouti Related Neuropeptide | Protein Coding | 51 | late-onset, neurodegeneration | 2 | 504 | 10.27 |
| AIFM1 | Apoptosis Inducing Factor Mitochondria Associated 1 | Protein Coding | 59 | Parkinson’s Disease, LRRK2, late-onset, neurodegeneration | 4 | 898 | 9.70 |
| FLT1 | Fms Related Tyrosine Kinase 1 | Protein Coding | 62 | Parkinson’s Disease, LRRK2, late-onset, neurodegeneration | 4 | 952 | 9.26 |
| RTN4 | Reticulon 4 | Protein Coding | 54 | LRRK2, late-onset, neurodegeneration | 3 | 792 | 9.21 |
| DYNC1H1 | Dynein Cytoplasmic 1 Heavy Chain 1 | Protein Coding | 54 | Parkinson’s Disease, LRRK2, late-onset, neurodegeneration | 4 | 965 | 9.11 |
| TUBA4A | Tubulin Alpha 4a | Protein Coding | 56 | Parkinson’s Disease, LRRK2, late-onset, neurodegeneration | 4 | 968 | 9.09 |
| CHMP2B | Charged Multivesicular Body Protein 2B | Protein Coding | 53 | Parkinson’s Disease, late-onset, neurodegeneration | 3 | 894 | 8.43 |
| RGS2 | Regulator Of G Protein Signaling 2 | Protein Coding | 51 | Parkinson’s Disease, LRRK2, neurodegeneration | 3 | 944 | 8.07 |
| HIST1H1C | Histone Cluster 1 H1 Family Member C | Protein Coding | 48 | Parkinson’s Disease | 1 | 520 | 7.11 |
| PAWR | Pro-Apoptotic WT1 Regulator | Protein Coding | 48 | Parkinson’s Disease, neurodegeneration | 2 | 982 | 6.36 |
| MS4A6A | Membrane Spanning 4-Domains A6A | Protein Coding | 45 | Parkinson’s Disease, late-onset, neurodegeneration | 3 | 1259 | 6.26 |
| MYO1C | Myosin IC | Protein Coding | 53 | Parkinson’s Disease, LRRK2, neurodegeneration | 3 | 1362 | 5.78 |
| SUMO1 | Small Ubiquitin-Like Modifier 1 | Protein Coding | 56 | Parkinson’s Disease, neurodegeneration | 2 | 1277 | 5.04 |
| ACLY | ATP Citrate Lyase | Protein Coding | 53 | Parkinson’s Disease, LRRK2, neurodegeneration | 3 | 1577 | 4.91 |
| SDHA | Succinate Dehydrogenase Complex Flavoprotein Subunit A | Protein Coding | 56 | Parkinson’s Disease, LRRK2, late-onset, neurodegeneration | 4 | 1850 | 4.77 |
| PGF | Placental Growth Factor | Protein Coding | 54 | LRRK2, late-onset | 2 | 1361 | 4.72 |
| TRADD | TNFRSF1A Associated Via Death Domain | Protein Coding | 51 | Parkinson’s Disease, LRRK2, neurodegeneration | 3 | 1694 | 4.57 |
| HSPA1L | Heat Shock Protein Family A (Hsp70) Member 1 Like | Protein Coding | 52 | Parkinson’s Disease, LRRK2 | 2 | 1457 | 4.38 |
| GABBR2 | Gamma-Aminobutyric Acid Type B Receptor Subunit 2 | Protein Coding | 57 | Parkinson’s Disease, LRRK2, late-onset | 3 | 1880 | 4.04 |
| PGAM1 | Phosphoglycerate Mutase 1 | Protein Coding | 54 | Parkinson’s Disease, neurodegeneration | 2 | 1602 | 3.95 |
| ATP6V1E1 | ATPase H+ Transporting V1 Subunit E1 | Protein Coding | 52 | Parkinson’s Disease, LRRK2, late-onset | 3 | 2008 | 3.70 |
| FZD5 | Frizzled Class Receptor 5 | Protein Coding | 57 | Parkinson’s Disease, LRRK2 | 2 | 1718 | 3.67 |
| MRPS22 | Mitochondrial Ribosomal Protein S22 | Protein Coding | 50 | Parkinson’s Disease, neurodegeneration | 2 | 1776 | 3.54 |
| PLIN2 | Perilipin 2 | Protein Coding | 51 | late-onset, neurodegeneration | 2 | 1805 | 3.47 |
| CIRBP | Cold Inducible RNA Binding Protein | Protein Coding | 48 | Parkinson’s Disease, LRRK2, neurodegeneration | 3 | 2119 | 3.38 |
| SNX10 | Sorting Nexin 10 | Protein Coding | 49 | Parkinson’s Disease, LRRK2 | 2 | 1913 | 3.22 |
| FABP3 | Fatty Acid Binding Protein 3 | Protein Coding | 54 | Parkinson’s Disease, neurodegeneration | 2 | 1947 | 3.13 |
| BCAT2 | Branched Chain Amino Acid Transaminase 2 | Protein Coding | 53 | Parkinson’s Disease, neurodegeneration | 2 | 1987 | 3.05 |
| ADH1B | Alcohol Dehydrogenase 1B (Class I), Beta Polypeptide | Protein Coding | 53 | Parkinson’s Disease, late-onset | 2 | 2004 | 3.03 |
| RARRES2 | Retinoic Acid Receptor Responder 2 | Protein Coding | 50 | late-onset | 1 | 1536 | 2.92 |
| DNAJC24 | DnaJ Heat Shock Protein Family (Hsp40) Member C24 | Protein Coding | 44 | LRRK2, late-onset | 2 | 2118 | 2.76 |
| PRPF4 | Pre-MRNA Processing Factor 4 | Protein Coding | 51 | Parkinson’s Disease, LRRK2, neurodegeneration | 3 | 2366 | 2.55 |
| FLCN | Folliculin | Protein Coding | 50 | Parkinson’s Disease, late-onset | 2 | 2227 | 2.46 |
| COPS5 | COP9 Signalosome Subunit 5 | Protein Coding | 51 | Parkinson’s Disease, LRRK2, neurodegeneration | 3 | 2450 | 2.31 |
| ENOPH1 | Enolase-Phosphatase 1 | Protein Coding | 49 | Parkinson’s Disease | 1 | 2186 | 1.83 |
| ZC3HAV1 | Zinc Finger CCCH-Type Containing, Antiviral 1 | Protein Coding | 46 | Parkinson’s Disease, neurodegeneration | 2 | 2481 | 1.81 |
| PLCG1 | Phospholipase C Gamma 1 | Protein Coding | 56 | Parkinson’s Disease, LRRK2 | 2 | 2492 | 1.80 |
| STAB1 | Stabilin 1 | Protein Coding | 48 | Parkinson’s Disease | 1 | 2218 | 1.76 |
| HP1BP3 | Heterochromatin Protein 1 Binding Protein 3 | Protein Coding | 44 | Parkinson’s Disease, neurodegeneration | 2 | 2528 | 1.71 |
| MLLT11 | MLLT11, Transcription Factor 7 Cofactor | Protein Coding | 45 | late-onset | 1 | 2261 | 1.68 |
| AGL | Amylo-Alpha-1, 6-Glucosidase, 4-Alpha-Glucanotransferase | Protein Coding | 55 | late-onset | 1 | 2261 | 1.68 |
| MAVS | Mitochondrial Antiviral Signaling Protein | Protein Coding | 50 | late-onset | 1 | 2261 | 1.68 |
| TAZ | Tafazzin | Protein Coding | 52 | late-onset | 1 | 2261 | 1.68 |
| SLC6A1 | Solute Carrier Family 6 Member 1 | Protein Coding | 57 | Parkinson’s Disease, neurodegeneration | 2 | 2545 | 1.67 |
| STAT4 | Signal Transducer And Activator Of Transcription 4 | Protein Coding | 56 | LRRK2, late-onset | 2 | 2559 | 1.63 |
| EIF4E | Eukaryotic Translation Initiation Factor 4E | Protein Coding | 59 | Parkinson’s Disease, LRRK2 | 2 | 2565 | 1.62 |
| NIPSNAP1 | Nipsnap Homolog 1 | Protein Coding | 49 | Parkinson’s Disease, LRRK2 | 2 | 2565 | 1.62 |
| RAD51C | RAD51 Paralog C | Protein Coding | 51 | Parkinson’s Disease, LRRK2 | 2 | 2573 | 1.60 |
| NCALD | Neurocalcin Delta | Protein Coding | 51 | Parkinson’s Disease, LRRK2 | 2 | 2573 | 1.60 |
| LRRC1 | Leucine Rich Repeat Containing 1 | Protein Coding | 44 | Parkinson’s Disease, LRRK2 | 2 | 2573 | 1.60 |
| PKNOX1 | PBX/Knotted 1 Homeobox 1 | Protein Coding | 47 | Parkinson’s Disease, LRRK2 | 2 | 2573 | 1.60 |
| GABARAPL1 | GABA Type A Receptor Associated Protein Like 1 | Protein Coding | 53 | Parkinson’s Disease, neurodegeneration | 2 | 2609 | 1.51 |
| MAP1LC3B | Microtubule Associated Protein 1 Light Chain 3 Beta | Protein Coding | 52 | Parkinson’s Disease, neurodegeneration | 2 | 2611 | 1.50 |
| LYVE1 | Lymphatic Vessel Endothelial Hyaluronan Receptor 1 | Protein Coding | 51 | late-onset, neurodegeneration | 2 | 2651 | 1.41 |
| DYNC1I1 | Dynein Cytoplasmic 1 Intermediate Chain 1 | Protein Coding | 48 | Parkinson’s Disease, neurodegeneration | 2 | 2666 | 1.37 |
| FOXO4 | Forkhead Box O4 | Protein Coding | 51 | Parkinson’s Disease, LRRK2 | 2 | 2667 | 1.37 |
| NME3 | NME/NM23 Nucleoside Diphosphate Kinase 3 | Protein Coding | 51 | Parkinson’s Disease, neurodegeneration | 2 | 2680 | 1.33 |
| SERTAD3 | SERTA Domain Containing 3 | Protein Coding | 38 | neurodegeneration | 1 | 2476 | 1.29 |
| USP8 | Ubiquitin Specific Peptidase 8 | Protein Coding | 57 | Parkinson’s Disease, LRRK2, neurodegeneration | 3 | 2776 | 1.27 |
| ATG14 | Autophagy Related 14 | Protein Coding | 43 | Parkinson’s Disease | 1 | 2551 | 1.17 |
| LDLRAD4 | Low Density Lipoprotein Receptor Class A Domain Containing 4 | Protein Coding | 40 | Parkinson’s Disease | 1 | 2551 | 1.17 |
| TAF4 | TATA-Box Binding Protein Associated Factor 4 | Protein Coding | 48 | LRRK2, neurodegeneration | 2 | 2778 | 1.03 |
| RBL1 | RB Transcriptional Corepressor Like 1 | Protein Coding | 51 | LRRK2, late-onset | 2 | 2802 | 0.95 |
| IARS2 | Isoleucyl-TRNA Synthetase 2, Mitochondrial | Protein Coding | 49 | Parkinson’s Disease | 1 | 2681 | 0.94 |
| KLHL36 | Kelch Like Family Member 36 | Protein Coding | 41 | late-onset | 1 | 2683 | 0.93 |
| ST3GAL5 | ST3 Beta-Galactoside Alpha-2,3-Sialyltransferase 5 | Protein Coding | 55 | late-onset | 1 | 2706 | 0.88 |
| CLIC2 | Chloride Intracellular Channel 2 | Protein Coding | 49 | late-onset | 1 | 2706 | 0.88 |
| DDX42 | DEAD-Box Helicase 42 | Protein Coding | 46 | Parkinson’s Disease | 1 | 2711 | 0.87 |
| ANXA6 | Annexin A6 | Protein Coding | 53 | Parkinson’s Disease | 1 | 2711 | 0.87 |
| CCDC40 | Coiled-Coil Domain Containing 40 | Protein Coding | 45 | Parkinson’s Disease | 1 | 2711 | 0.87 |
| LAG3 | Lymphocyte Activating 3 | Protein Coding | 44 | Parkinson’s Disease | 1 | 2711 | 0.87 |
| SDCCAG3 | Serologically Defined Colon Cancer Antigen 3 | Protein Coding | 41 | Parkinson’s Disease | 1 | 2711 | 0.87 |
| BCAR3 | BCAR3, NSP Family Adaptor Protein | Protein Coding | 49 | Parkinson’s Disease | 1 | 2711 | 0.87 |
| IMP4 | IMP4, U3 Small Nucleolar Ribonucleoprotein | Protein Coding | 42 | Parkinson’s Disease | 1 | 2711 | 0.87 |
| LETMD1 | LETM1 Domain Containing 1 | Protein Coding | 43 | Parkinson’s Disease | 1 | 2711 | 0.87 |
| PTP4A1 | Protein Tyrosine Phosphatase Type IVA, Member 1 | Protein Coding | 51 | Parkinson’s Disease | 1 | 2711 | 0.87 |
| RCC1 | Regulator Of Chromosome Condensation 1 | Protein Coding | 49 | LRRK2, neurodegeneration | 2 | 2829 | 0.86 |
| TXNL1 | Thioredoxin Like 1 | Protein Coding | 47 | late-onset | 1 | 2741 | 0.79 |
| CD84 | CD84 Molecule | Protein Coding | 51 | late-onset | 1 | 2806 | 0.66 |
| DUSP9 | Dual Specificity Phosphatase 9 | Protein Coding | 50 | LRRK2 | 1 | 2807 | 0.66 |
| FHIT | Fragile Histidine Triad | Protein Coding | 54 | Parkinson’s Disease, late-onset | 2 | 2896 | 0.61 |
| ANGPT2 | Angiopoietin 2 | Protein Coding | 55 | LRRK2, late-onset | 2 | 2901 | 0.58 |
| INHBB | Inhibin Beta B Subunit | Protein Coding | 53 | LRRK2, late-onset | 2 | 2901 | 0.58 |
| CALML4 | Calmodulin Like 4 | Protein Coding | 41 | LRRK2 | 1 | 2853 | 0.54 |
| HDAC7 | Histone Deacetylase 7 | Protein Coding | 52 | LRRK2 | 1 | 2853 | 0.54 |
| ATP2B2 | ATPase Plasma Membrane Ca2+ Transporting 2 | Protein Coding | 54 | LRRK2, neurodegeneration | 2 | 2915 | 0.52 |
| PPP2R1B | Protein Phosphatase 2 Scaffold Subunit Abeta | Protein Coding | 57 | LRRK2 | 1 | 2905 | 0.39 |
| RGS4 | Regulator Of G Protein Signaling 4 | Protein Coding | 52 | LRRK2 | 1 | 2905 | 0.39 |
| KPNA1 | Karyopherin Subunit Alpha 1 | Protein Coding | 51 | Parkinson’s Disease, neurodegeneration | 2 | 2942 | 0.35 |
| NRXN3 | Neurexin 3 | Protein Coding | 51 | neurodegeneration | 1 | 2920 | 0.34 |
| SARS | Seryl-TRNA Synthetase | Protein Coding | 53 | neurodegeneration | 1 | 2920 | 0.34 |
| PGRMC1 | Progesterone Receptor Membrane Component 1 | Protein Coding | 53 | neurodegeneration | 1 | 2920 | 0.34 |
| RDH16 | Retinol Dehydrogenase 16 (All-Trans) | Protein Coding | 44 | LRRK2 | 1 | 2936 | 0.28 |
| SMAD6 | SMAD Family Member 6 | Protein Coding | 54 | LRRK2 | 1 | 2936 | 0.28 |
| NFATC4 | Nuclear Factor Of Activated T Cells 4 | Protein Coding | 53 | LRRK2 | 1 | 2936 | 0.28 |
| REEP5 | Receptor Accessory Protein 5 | Protein Coding | 46 | LRRK2 | 1 | 2936 | 0.28 |
| EFNA4 | Ephrin A4 | Protein Coding | 50 | LRRK2 | 1 | 2936 | 0.28 |
| MST1 | Macrophage Stimulating 1 | Protein Coding | 55 | LRRK2 | 1 | 2936 | 0.28 |
| FSTL3 | Follistatin Like 3 | Protein Coding | 48 | LRRK2 | 1 | 2936 | 0.28 |
| LAMA5 | Laminin Subunit Alpha 5 | Protein Coding | 51 | LRRK2 | 1 | 2936 | 0.28 |
| PRC1 | Protein Regulator Of Cytokinesis 1 | Protein Coding | 47 | LRRK2 | 1 | 2936 | 0.28 |
| LAMB2 | Laminin Subunit Beta 2 | Protein Coding | 55 | LRRK2 | 1 | 2936 | 0.28 |
| ZNF629 | Zinc Finger Protein 629 | Protein Coding | 38 | LRRK2 | 1 | 2938 | 0.27 |
| NR6A1 | Nuclear Receptor Subfamily 6 Group A Member 1 | Protein Coding | 50 | LRRK2 | 1 | 2938 | 0.27 |
| ATP8A2 | ATPase Phospholipid Transporting 8A2 | Protein Coding | 50 | LRRK2 | 1 | 2938 | 0.27 |
| AREL1 | Apoptosis Resistant E3 Ubiquitin Protein Ligase 1 | Protein Coding | 38 | LRRK2 | 1 | 2938 | 0.27 |
| ZNF692 | Zinc Finger Protein 692 | Protein Coding | 40 | LRRK2 | 1 | 2938 | 0.27 |
| ROBO3 | Roundabout Guidance Receptor 3 | Protein Coding | 52 | LRRK2 | 1 | 2938 | 0.27 |
| ACAT2 | Acetyl-CoA Acetyltransferase 2 | Protein Coding | 55 | LRRK2 | 1 | 2938 | 0.27 |
| STX2 | Syntaxin 2 | Protein Coding | 48 | LRRK2 | 1 | 2938 | 0.27 |
| TCF3 | Transcription Factor 3 | Protein Coding | 55 | LRRK2 | 1 | 2938 | 0.27 |
| RAB6B | RAB6B, Member RAS Oncogene Family | Protein Coding | 46 | LRRK2 | 1 | 2938 | 0.27 |
| DOCK6 | Dedicator Of Cytokinesis 6 | Protein Coding | 50 | LRRK2 | 1 | 2938 | 0.27 |
| INSIG2 | Insulin Induced Gene 2 | Protein Coding | 47 | LRRK2 | 1 | 2938 | 0.27 |
| DNAJB9 | DnaJ Heat Shock Protein Family (Hsp40) Member B9 | Protein Coding | 45 | LRRK2 | 1 | 2938 | 0.27 |
| HLTF | Helicase Like Transcription Factor | Protein Coding | 48 | LRRK2 | 1 | 2938 | 0.27 |
| VPS4B | Vacuolar Protein Sorting 4 Homolog B | Protein Coding | 51 | Parkinson’s Disease | 1 | 2952 | 0.18 |
| SLC23A2 | Solute Carrier Family 23 Member 2 | Protein Coding | 52 | Parkinson’s Disease | 1 | 2952 | 0.18 |
| TULP2 | Tubby Like Protein 2 | Protein Coding | 44 | late-onset | 1 | 2955 | 0.13 |
| ALAS1 | 5’-Aminolevulinate Synthase 1 | Protein Coding | 53 | late-onset | 1 | 2955 | 0.13 |
| PTBP2 | Polypyrimidine Tract Binding Protein 2 | Protein Coding | 47 | neurodegeneration | 1 | 2959 | 0.07 |
| ZMYND8 | Zinc Finger MYND-Type Containing 8 | Protein Coding | 46 | neurodegeneration | 1 | 2959 | 0.07 |
| WDYHV1 | WDYHV Motif Containing 1 | Protein Coding | 39 | neurodegeneration | 1 | 2959 | 0.07 |
| FAM193B | Family With Sequence Similarity 193 Member B | Protein Coding | 42 | neurodegeneration | 1 | 2959 | 0.07 |
| SEZ6L2 | Seizure Related 6 Homolog Like 2 | Protein Coding | 46 | neurodegeneration | 1 | 2959 | 0.07 |
Using the 210 DEGs I seeded a network of 2429 nodes and 4097 edges.
library(biomaRt)
setwd("/Users/clairegreen/Documents/PhD/TDP-43/TDP-43_Code/Results/PPI_Network/")
PPI <- read.table("iref14_Human_UP_noDup_table_nodash.txt", header = T)
braingenes <- read.csv("Zhang_BrainCelltype_Markers_braingenes.csv", header = T)
DEG_list <- readLines("~/Documents/PhD/Parkinsons/Parkinsons_Code/Results/FamilialBlood/ALS_sfblood_ALLgenes.txt")
mart <- useMart("ENSEMBL_MART_ENSEMBL",dataset="hsapiens_gene_ensembl", host="www.ensembl.org")
attributes <- listAttributes(mart)
mart_back <- getBM(attributes =c("hgnc_symbol", "uniprotswissprot"), filters="hgnc_symbol", values=DEG_list, mart=mart)
genelist_Uniprot <- subset(mart_back, !(mart_back$uniprotswissprot == ""))
setwd("/Users/clairegreen/Documents/PhD/Parkinsons/Parkinsons_Code/Results/FamilialBlood/")
write.csv(genelist_Uniprot, "martback.csv", row.names = F)
# IDENTIFY MISSING GENES AND FIND UNIPROT CODES FOR THEM. NB SOME GENES MAY NOT BE PROTEIN CODING #
mart_table <- read.csv("martback.csv", header = T) #A table with the uniprot codes for the DEGs
uniprot_gene <- mart_table$uniprotswissprot
DEG_PPI <- subset(PPI, PPI$V1 %in% uniprot_gene | PPI$V2 %in% uniprot_gene)
rownames(DEG_PPI) <- 1:nrow(DEG_PPI)
write.csv(DEG_PPI, "DEG_PPI_famBlood.csv", row.names = F)
## Convert Uniprot ID to HGNC symbol. Biomart jumbles output so
## Go To https://biodbnet-abcc.ncifcrf.gov/db/db2db.php submit each column of names. Select Uniprot Accession for input and Gene Symbol for output.
# Select "NO" for remove duplicate input valies
DEG_PPI <- read.csv("DEG_PPI_famBlood.csv")
DEG_PPI <- subset(DEG_PPI, DEG_PPI$Gene1 !="-")
DEG_PPI <- subset(DEG_PPI, DEG_PPI$Gene2 !="-")
write.csv(DEG_PPI, "FinalPDPPI.csv", row.names = F, quote = F)
I ran common coexpression again using Sharc. DUM was again removed because it had ~100,000 fewer edges than the other datasets. I then plotted the distributions of rho values to check they were normally distributed:
From this it is clear that the LEW and MOR.FC datasets do not have the power for a normal coexpression distribution. These two datasets were therefore removed from the common coexpression edge analysis.
When looking for a common edge, I tried three different minimum Rho values; 0.5, 0.4, and 0.3.
The largest connected component consists of 82 genes and 95 edges.
Here we can see that there are 25 LCC proteins which forming 45 PPI relationships with 25 PD Malacard genes (1 overlap of PARK7)
Although already noted, the VarElect results clearly show the membership of PARK7 in the network. The interesting thing about PARK7 being here is that it is more commonly associated with early-onset or juvenile Parkinson’s disease. Another interesting point is that, although PARK7 PD is early-onset, it’s classified by UniProt as having “slow progression and initial good response to levodopa”.
CYCS or Cytochrome C, Somatic, is a small heme protein that functions as a central component of the electron transport chain in mitochondria. It is closely related to Parkinson’s disease in the literature due to its apparent malfuction in the disease.
| Symbol | Description | Category | GIFTs | Matched Phenotypes | Matched Phenotypes Scount | Global Rank (Total Genes 11967) | P-Value | Score | Average Disease Causing Likelihood |
|---|---|---|---|---|---|---|---|---|---|
| PARK7 | Parkinsonism Associated Deglycase | Protein Coding | 56 | Parkinson’s Disease, LRRK2, late-onset, neurodegeneration | 4 | 5 | 270.87 | ||
| CYCS | Cytochrome C, Somatic | Protein Coding | 57 | Parkinson’s Disease, LRRK2, late-onset, neurodegeneration | 4 | 63 | 63.23 | ||
| UBE2E3 | Ubiquitin Conjugating Enzyme E2 E3 | Protein Coding | 51 | Parkinson’s Disease, LRRK2, late-onset, neurodegeneration | 4 | 119 | 48.51 | ||
| STXBP1 | Syntaxin Binding Protein 1 | Protein Coding | 57 | Parkinson’s Disease, LRRK2, late-onset, neurodegeneration | 4 | 370 | 18.76 | ||
| VDR | Vitamin D Receptor | Protein Coding | 61 | Parkinson’s Disease, LRRK2, late-onset, neurodegeneration | 4 | 380 | 18.20 | ||
| COX5A | Cytochrome C Oxidase Subunit 5A | Protein Coding | 53 | Parkinson’s Disease, LRRK2, late-onset, neurodegeneration | 4 | 536 | 13.99 | ||
| PGK1 | Phosphoglycerate Kinase 1 | Protein Coding | 59 | Parkinson’s Disease, late-onset, neurodegeneration | 3 | 513 | 12.40 | ||
| NDUFA8 | NADH:Ubiquinone Oxidoreductase Subunit A8 | Protein Coding | 50 | Parkinson’s Disease, LRRK2, late-onset | 3 | 641 | 10.77 | ||
| EIF4EBP1 | Eukaryotic Translation Initiation Factor 4E Binding Protein 1 | Protein Coding | 57 | Parkinson’s Disease, LRRK2, late-onset | 3 | 715 | 9.90 | ||
| DLD | Dihydrolipoamide Dehydrogenase | Protein Coding | 61 | Parkinson’s Disease, LRRK2, late-onset, neurodegeneration | 4 | 899 | 9.68 | ||
| PSMC6 | Proteasome 26S Subunit, ATPase 6 | Protein Coding | 50 | Parkinson’s Disease, LRRK2, late-onset, neurodegeneration | 4 | 925 | 9.51 | ||
| PLA2G2A | Phospholipase A2 Group IIA | Protein Coding | 56 | Parkinson’s Disease, late-onset, neurodegeneration | 3 | 770 | 9.43 | ||
| RTN4 | Reticulon 4 | Protein Coding | 54 | LRRK2, late-onset, neurodegeneration | 3 | 792 | 9.21 | ||
| NDUFA5 | NADH:Ubiquinone Oxidoreductase Subunit A5 | Protein Coding | 51 | Parkinson’s Disease, LRRK2, late-onset, neurodegeneration | 4 | 998 | 8.87 | ||
| UQCRC2 | Ubiquinol-Cytochrome C Reductase Core Protein 2 | Protein Coding | 55 | Parkinson’s Disease, LRRK2, neurodegeneration | 3 | 1033 | 7.46 | ||
| RAB11A | RAB11A, Member RAS Oncogene Family | Protein Coding | 55 | Parkinson’s Disease, LRRK2, late-onset, neurodegeneration | 4 | 1321 | 6.93 | ||
| SFXN1 | Sideroflexin 1 | Protein Coding | 47 | Parkinson’s Disease, LRRK2, neurodegeneration | 3 | 1241 | 6.34 | ||
| RAB1A | RAB1A, Member RAS Oncogene Family | Protein Coding | 51 | Parkinson’s Disease, LRRK2, neurodegeneration | 3 | 1282 | 6.14 | ||
| PSMC5 | Proteasome 26S Subunit, ATPase 5 | Protein Coding | 51 | Parkinson’s Disease, LRRK2, neurodegeneration | 3 | 1311 | 6.03 | ||
| PSMA7 | Proteasome Subunit Alpha 7 | Protein Coding | 55 | Parkinson’s Disease, LRRK2, neurodegeneration | 3 | 1482 | 5.27 | ||
| SUMO1 | Small Ubiquitin-Like Modifier 1 | Protein Coding | 56 | Parkinson’s Disease, neurodegeneration | 2 | 1277 | 5.04 | ||
| ACLY | ATP Citrate Lyase | Protein Coding | 53 | Parkinson’s Disease, LRRK2, neurodegeneration | 3 | 1577 | 4.91 | ||
| SDHA | Succinate Dehydrogenase Complex Flavoprotein Subunit A | Protein Coding | 56 | Parkinson’s Disease, LRRK2, late-onset, neurodegeneration | 4 | 1850 | 4.77 | ||
| PSMD6 | Proteasome 26S Subunit, Non-ATPase 6 | Protein Coding | 47 | Parkinson’s Disease, LRRK2, neurodegeneration | 3 | 1696 | 4.56 | ||
| UQCRC1 | Ubiquinol-Cytochrome C Reductase Core Protein 1 | Protein Coding | 50 | Parkinson’s Disease, LRRK2, neurodegeneration | 3 | 1746 | 4.43 | ||
| PDHB | Pyruvate Dehydrogenase E1 Beta Subunit | Protein Coding | 56 | Parkinson’s Disease, LRRK2, neurodegeneration | 3 | 1781 | 4.32 | ||
| NR4A1 | Nuclear Receptor Subfamily 4 Group A Member 1 | Protein Coding | 55 | Parkinson’s Disease, LRRK2, neurodegeneration | 3 | 1865 | 4.09 | ||
| IARS | Isoleucyl-TRNA Synthetase | Protein Coding | 55 | Parkinson’s Disease, neurodegeneration | 2 | 1597 | 3.97 | ||
| ATP6V1E1 | ATPase H+ Transporting V1 Subunit E1 | Protein Coding | 52 | Parkinson’s Disease, LRRK2, late-onset | 3 | 2008 | 3.70 | ||
| MAP2K1 | Mitogen-Activated Protein Kinase Kinase 1 | Protein Coding | 65 | Parkinson’s Disease, LRRK2, late-onset, neurodegeneration | 4 | 2197 | 3.60 | ||
| PPP2CA | Protein Phosphatase 2 Catalytic Subunit Alpha | Protein Coding | 56 | Parkinson’s Disease, LRRK2, neurodegeneration | 3 | 2050 | 3.58 | ||
| TRIM37 | Tripartite Motif Containing 37 | Protein Coding | 51 | neurodegeneration | 1 | 1309 | 3.49 | ||
| UQCRFS1 | Ubiquinol-Cytochrome C Reductase, Rieske Iron-Sulfur Polypeptide 1 | Protein Coding | 51 | Parkinson’s Disease, LRRK2, neurodegeneration | 3 | 2137 | 3.31 | ||
| VAV1 | Vav Guanine Nucleotide Exchange Factor 1 | Protein Coding | 57 | LRRK2, late-onset | 2 | 1938 | 3.15 | ||
| RUNX2 | Runt Related Transcription Factor 2 | Protein Coding | 57 | late-onset | 1 | 1481 | 3.04 | ||
| CUL3 | Cullin 3 | Protein Coding | 55 | Parkinson’s Disease, LRRK2, neurodegeneration | 3 | 2349 | 2.62 | ||
| PPIA | Peptidylprolyl Isomerase A | Protein Coding | 57 | Parkinson’s Disease, LRRK2, neurodegeneration | 3 | 2366 | 2.55 | ||
| UBE2K | Ubiquitin Conjugating Enzyme E2 K | Protein Coding | 51 | Parkinson’s Disease, LRRK2, neurodegeneration | 3 | 2388 | 2.49 | ||
| COPS5 | COP9 Signalosome Subunit 5 | Protein Coding | 51 | Parkinson’s Disease, LRRK2, neurodegeneration | 3 | 2450 | 2.31 | ||
| PDHA1 | Pyruvate Dehydrogenase E1 Alpha 1 Subunit | Protein Coding | 59 | Parkinson’s Disease, LRRK2, neurodegeneration | 3 | 2547 | 2.04 | ||
| DDX1 | DEAD-Box Helicase 1 | Protein Coding | 51 | Parkinson’s Disease, neurodegeneration | 2 | 2528 | 1.71 | ||
| ANXA7 | Annexin A7 | Protein Coding | 51 | late-onset, neurodegeneration | 2 | 2571 | 1.60 | ||
| YARS | Tyrosyl-TRNA Synthetase | Protein Coding | 53 | late-onset, neurodegeneration | 2 | 2576 | 1.60 | ||
| PSMD12 | Proteasome 26S Subunit, Non-ATPase 12 | Protein Coding | 49 | Parkinson’s Disease, LRRK2, neurodegeneration | 3 | 2710 | 1.50 | ||
| EPS15 | Epidermal Growth Factor Receptor Pathway Substrate 15 | Protein Coding | 54 | Parkinson’s Disease, LRRK2 | 2 | 2701 | 1.26 | ||
| PIK3CB | Phosphatidylinositol-4,5-Bisphosphate 3-Kinase Catalytic Subunit Beta | Protein Coding | 57 | LRRK2, neurodegeneration | 2 | 2746 | 1.11 | ||
| MDH1 | Malate Dehydrogenase 1 | Protein Coding | 55 | Parkinson’s Disease, neurodegeneration | 2 | 2770 | 1.04 | ||
| ATP6V1B2 | ATPase H+ Transporting V1 Subunit B2 | Protein Coding | 56 | LRRK2, neurodegeneration | 2 | 2775 | 1.03 | ||
| GABARAPL2 | GABA Type A Receptor Associated Protein Like 2 | Protein Coding | 55 | Parkinson’s Disease | 1 | 2652 | 0.99 | ||
| IARS2 | Isoleucyl-TRNA Synthetase 2, Mitochondrial | Protein Coding | 49 | Parkinson’s Disease | 1 | 2681 | 0.94 | ||
| STRAP | Serine/Threonine Kinase Receptor Associated Protein | Protein Coding | 45 | LRRK2, neurodegeneration | 2 | 2823 | 0.88 | ||
| FOSL1 | FOS Like 1, AP-1 Transcription Factor Subunit | Protein Coding | 52 | LRRK2, neurodegeneration | 2 | 2826 | 0.87 | ||
| ICK | Intestinal Cell Kinase | Protein Coding | 53 | Parkinson’s Disease | 1 | 2711 | 0.87 | ||
| TXNL1 | Thioredoxin Like 1 | Protein Coding | 47 | late-onset | 1 | 2741 | 0.79 | ||
| TMEM30A | Transmembrane Protein 30A | Protein Coding | 46 | Parkinson’s Disease, LRRK2 | 2 | 2895 | 0.62 | ||
| RRAS | RAS Related | Protein Coding | 52 | LRRK2 | 1 | 2853 | 0.54 | ||
| PRDM4 | PR/SET Domain 4 | Protein Coding | 43 | LRRK2 | 1 | 2853 | 0.54 | ||
| NPTN | Neuroplastin | Protein Coding | 49 | neurodegeneration | 1 | 2878 | 0.49 | ||
| ATP6V1H | ATPase H+ Transporting V1 Subunit H | Protein Coding | 48 | LRRK2 | 1 | 2905 | 0.39 | ||
| NME1 | NME/NM23 Nucleoside Diphosphate Kinase 1 | Protein Coding | 57 | neurodegeneration | 1 | 2920 | 0.34 | ||
| REEP5 | Receptor Accessory Protein 5 | Protein Coding | 46 | LRRK2 | 1 | 2936 | 0.28 | ||
| SUCLA2 | Succinate-CoA Ligase ADP-Forming Beta Subunit | Protein Coding | 57 | LRRK2 | 1 | 2936 | 0.28 | ||
| BCAS2 | BCAS2, Pre-MRNA Processing Factor | Protein Coding | 46 | LRRK2 | 1 | 2938 | 0.27 | ||
| SF3A2 | Splicing Factor 3a Subunit 2 | Protein Coding | 46 | LRRK2 | 1 | 2938 | 0.27 | ||
| HLTF | Helicase Like Transcription Factor | Protein Coding | 48 | LRRK2 | 1 | 2938 | 0.27 | ||
| RAB22A | RAB22A, Member RAS Oncogene Family | Protein Coding | 44 | LRRK2 | 1 | 2938 | 0.27 | ||
| RAN | RAN, Member RAS Oncogene Family | Protein Coding | 54 | neurodegeneration | 1 | 2958 | 0.10 | ||
| PTBP2 | Polypyrimidine Tract Binding Protein 2 | Protein Coding | 47 | neurodegeneration | 1 | 2959 | 0.07 |
| Implicated Symbol | Implicating Symbol | Description | Category | GIFTs | Matched Phenotypes | Matched Phenotypes Count | Global Rank (Total Genes 11967) | Score (Implicated) |
|---|---|---|---|---|---|---|---|---|
| UBE2V2 | PRKN | Parkin RBR E3 Ubiquitin Protein Ligase | Protein Coding | 47 | Parkinson’s Disease, LRRK2, late-onset, neurodegeneration | 4 | 3 | 23.14 |
| UBE2V2 | RPS27A | Ribosomal Protein S27a | Protein Coding | 53 | Parkinson’s Disease, LRRK2, late-onset, neurodegeneration | 4 | 45 | 23.14 |
| UBE2V2 | FBXO7 | F-Box Protein 7 | Protein Coding | 50 | Parkinson’s Disease, LRRK2, late-onset, neurodegeneration | 4 | 16 | 23.14 |
| UBE2V2 | UCHL1 | Ubiquitin C-Terminal Hydrolase L1 | Protein Coding | 62 | Parkinson’s Disease, LRRK2, late-onset, neurodegeneration | 4 | 7 | 23.14 |
| UBE2V2 | UBE2E2 | Ubiquitin Conjugating Enzyme E2 E2 | Protein Coding | 49 | Parkinson’s Disease, LRRK2, late-onset, neurodegeneration | 4 | 145 | 23.14 |
| SNAP91 | SNCA | Synuclein Alpha | Protein Coding | 61 | Parkinson’s Disease, LRRK2, late-onset, neurodegeneration | 4 | 2 | 13.14 |
| SNAP91 | PRKN | Parkin RBR E3 Ubiquitin Protein Ligase | Protein Coding | 47 | Parkinson’s Disease, LRRK2, late-onset, neurodegeneration | 4 | 3 | 13.14 |
| SNAP91 | SYNJ1 | Synaptojanin 1 | Protein Coding | 57 | Parkinson’s Disease, LRRK2, late-onset, neurodegeneration | 4 | 33 | 13.14 |
| SNAP91 | DNAJC6 | DnaJ Heat Shock Protein Family (Hsp40) Member C6 | Protein Coding | 51 | Parkinson’s Disease, LRRK2, late-onset | 3 | 27 | 13.14 |
| SNAP91 | UCHL1 | Ubiquitin C-Terminal Hydrolase L1 | Protein Coding | 62 | Parkinson’s Disease, LRRK2, late-onset, neurodegeneration | 4 | 7 | 13.14 |
| TERF2IP | UCHL1 | Ubiquitin C-Terminal Hydrolase L1 | Protein Coding | 62 | Parkinson’s Disease, LRRK2, late-onset, neurodegeneration | 4 | 7 | 12.55 |
| TERF2IP | SNCA | Synuclein Alpha | Protein Coding | 61 | Parkinson’s Disease, LRRK2, late-onset, neurodegeneration | 4 | 2 | 12.55 |
| TERF2IP | AKT1 | AKT Serine/Threonine Kinase 1 | Protein Coding | 65 | Parkinson’s Disease, LRRK2, late-onset, neurodegeneration | 4 | 79 | 12.55 |
| TERF2IP | RPS27A | Ribosomal Protein S27a | Protein Coding | 53 | Parkinson’s Disease, LRRK2, late-onset, neurodegeneration | 4 | 45 | 12.55 |
| TERF2IP | TBP | TATA-Box Binding Protein | Protein Coding | 57 | Parkinson’s Disease, LRRK2, late-onset, neurodegeneration | 4 | 40 | 12.55 |
| PYCARD | BAX | BCL2 Associated X, Apoptosis Regulator | Protein Coding | 57 | Parkinson’s Disease, LRRK2, late-onset, neurodegeneration | 4 | 107 | 9.78 |
| PYCARD | NLRP3 | NLR Family Pyrin Domain Containing 3 | Protein Coding | 56 | Parkinson’s Disease, late-onset | 2 | 222 | 9.78 |
| PYCARD | PRKN | Parkin RBR E3 Ubiquitin Protein Ligase | Protein Coding | 47 | Parkinson’s Disease, LRRK2, late-onset, neurodegeneration | 4 | 3 | 9.78 |
| PYCARD | CASP3 | Caspase 3 | Protein Coding | 61 | Parkinson’s Disease, LRRK2, late-onset, neurodegeneration | 4 | 56 | 9.78 |
| PYCARD | SQSTM1 | Sequestosome 1 | Protein Coding | 59 | Parkinson’s Disease, LRRK2, late-onset, neurodegeneration | 4 | 48 | 9.78 |
| SRPRB | SNCA | Synuclein Alpha | Protein Coding | 61 | Parkinson’s Disease, LRRK2, late-onset, neurodegeneration | 4 | 2 | 8.42 |
| SRPRB | PRKN | Parkin RBR E3 Ubiquitin Protein Ligase | Protein Coding | 47 | Parkinson’s Disease, LRRK2, late-onset, neurodegeneration | 4 | 3 | 8.42 |
| SRPRB | RPS27A | Ribosomal Protein S27a | Protein Coding | 53 | Parkinson’s Disease, LRRK2, late-onset, neurodegeneration | 4 | 45 | 8.42 |
| SRPRB | UCHL1 | Ubiquitin C-Terminal Hydrolase L1 | Protein Coding | 62 | Parkinson’s Disease, LRRK2, late-onset, neurodegeneration | 4 | 7 | 8.42 |
| SRPRB | SNCAIP | Synuclein Alpha Interacting Protein | Protein Coding | 52 | Parkinson’s Disease, LRRK2, late-onset, neurodegeneration | 4 | 8 | 8.42 |
| EIF1B | EIF4G1 | Eukaryotic Translation Initiation Factor 4 Gamma 1 | Protein Coding | 55 | Parkinson’s Disease, LRRK2, late-onset, neurodegeneration | 4 | 36 | 7.94 |
| EIF1B | UCHL1 | Ubiquitin C-Terminal Hydrolase L1 | Protein Coding | 62 | Parkinson’s Disease, LRRK2, late-onset, neurodegeneration | 4 | 7 | 7.94 |
| EIF1B | RPS27A | Ribosomal Protein S27a | Protein Coding | 53 | Parkinson’s Disease, LRRK2, late-onset, neurodegeneration | 4 | 45 | 7.94 |
| EIF1B | SOD2 | Superoxide Dismutase 2 | Protein Coding | 60 | Parkinson’s Disease, LRRK2, late-onset, neurodegeneration | 4 | 78 | 7.94 |
| EIF1B | GAPDH | Glyceraldehyde-3-Phosphate Dehydrogenase | Protein Coding | 58 | Parkinson’s Disease, LRRK2, late-onset, neurodegeneration | 4 | 98 | 7.94 |
| MRPL15 | PARK7 | Parkinsonism Associated Deglycase | Protein Coding | 56 | Parkinson’s Disease, LRRK2, late-onset, neurodegeneration | 4 | 5 | 7.78 |
| MRPL15 | RPS27A | Ribosomal Protein S27a | Protein Coding | 53 | Parkinson’s Disease, LRRK2, late-onset, neurodegeneration | 4 | 45 | 7.78 |
| MRPL15 | NDUFS4 | NADH:Ubiquinone Oxidoreductase Subunit S4 | Protein Coding | 52 | Parkinson’s Disease, LRRK2, late-onset, neurodegeneration | 4 | 31 | 7.78 |
| MRPL15 | TH | Tyrosine Hydroxylase | Protein Coding | 62 | Parkinson’s Disease, LRRK2, late-onset, neurodegeneration | 4 | 24 | 7.78 |
| MRPL15 | EIF4G1 | Eukaryotic Translation Initiation Factor 4 Gamma 1 | Protein Coding | 55 | Parkinson’s Disease, LRRK2, late-onset, neurodegeneration | 4 | 36 | 7.78 |
| LYL1 | GDNF | Glial Cell Derived Neurotrophic Factor | Protein Coding | 59 | Parkinson’s Disease, LRRK2, late-onset, neurodegeneration | 4 | 52 | 7.74 |
| LYL1 | PLAU | Plasminogen Activator, Urokinase | Protein Coding | 63 | LRRK2, late-onset | 2 | 162 | 7.74 |
| LYL1 | NFKB1 | Nuclear Factor Kappa B Subunit 1 | Protein Coding | 63 | Parkinson’s Disease, LRRK2, late-onset, neurodegeneration | 4 | 409 | 7.74 |
| LYL1 | ZEB1 | Zinc Finger E-Box Binding Homeobox 1 | Protein Coding | 57 | late-onset | 1 | 273 | 7.74 |
| LYL1 | BAX | BCL2 Associated X, Apoptosis Regulator | Protein Coding | 57 | Parkinson’s Disease, LRRK2, late-onset, neurodegeneration | 4 | 107 | 7.74 |
| XRCC2 | RPS27A | Ribosomal Protein S27a | Protein Coding | 53 | Parkinson’s Disease, LRRK2, late-onset, neurodegeneration | 4 | 45 | 6.43 |
| XRCC2 | NBN | Nibrin | Protein Coding | 55 | Parkinson’s Disease, late-onset, neurodegeneration | 3 | 388 | 6.43 |
| XRCC2 | TP53 | Tumor Protein P53 | Protein Coding | 65 | Parkinson’s Disease, LRRK2, late-onset, neurodegeneration | 4 | 201 | 6.43 |
| XRCC2 | COMT | Catechol-O-Methyltransferase | Protein Coding | 61 | Parkinson’s Disease, LRRK2, late-onset, neurodegeneration | 4 | 34 | 6.43 |
| XRCC2 | ATM | ATM Serine/Threonine Kinase | Protein Coding | 64 | Parkinson’s Disease, LRRK2, late-onset, neurodegeneration | 4 | 498 | 6.43 |
| BMP8B | PLA2G6 | Phospholipase A2 Group VI | Protein Coding | 57 | Parkinson’s Disease, LRRK2, late-onset, neurodegeneration | 4 | 12 | 6.27 |
| BMP8B | BDNF | Brain Derived Neurotrophic Factor | Protein Coding | 59 | Parkinson’s Disease, LRRK2, late-onset, neurodegeneration | 4 | 42 | 6.27 |
| BMP8B | MAPK14 | Mitogen-Activated Protein Kinase 14 | Protein Coding | 62 | Parkinson’s Disease, LRRK2, late-onset, neurodegeneration | 4 | 89 | 6.27 |
| BMP8B | IL10 | Interleukin 10 | Protein Coding | 57 | Parkinson’s Disease, LRRK2, late-onset, neurodegeneration | 4 | 243 | 6.27 |
| BMP8B | GDNF | Glial Cell Derived Neurotrophic Factor | Protein Coding | 59 | Parkinson’s Disease, LRRK2, late-onset, neurodegeneration | 4 | 52 | 6.27 |
| LAPTM4B | GBA | Glucosylceramidase Beta | Protein Coding | 59 | Parkinson’s Disease, LRRK2, late-onset, neurodegeneration | 4 | 11 | 5.59 |
| LAPTM4B | ABCB1 | ATP Binding Cassette Subfamily B Member 1 | Protein Coding | 61 | Parkinson’s Disease, LRRK2, late-onset, neurodegeneration | 4 | 449 | 5.59 |
| LAPTM4B | NEDD4 | Neural Precursor Cell Expressed, Developmentally Down-Regulated 4, E3 Ubiquitin Protein Ligase | Protein Coding | 55 | Parkinson’s Disease, LRRK2, late-onset, neurodegeneration | 4 | 669 | 5.59 |
| LAPTM4B | SLC11A2 | Solute Carrier Family 11 Member 2 | Protein Coding | 58 | Parkinson’s Disease, LRRK2, late-onset, neurodegeneration | 4 | 132 | 5.59 |
| LAPTM4B | NFE2L2 | Nuclear Factor, Erythroid 2 Like 2 | Protein Coding | 56 | Parkinson’s Disease, LRRK2, late-onset, neurodegeneration | 4 | 351 | 5.59 |
| ADSS | GBA | Glucosylceramidase Beta | Protein Coding | 59 | Parkinson’s Disease, LRRK2, late-onset, neurodegeneration | 4 | 11 | 4.39 |
| ADSS | PLA2G6 | Phospholipase A2 Group VI | Protein Coding | 57 | Parkinson’s Disease, LRRK2, late-onset, neurodegeneration | 4 | 12 | 4.39 |
| ADSS | GAD1 | Glutamate Decarboxylase 1 | Protein Coding | 61 | Parkinson’s Disease, LRRK2, late-onset, neurodegeneration | 4 | 120 | 4.39 |
| ADSS | APOE | Apolipoprotein E | Protein Coding | 61 | Parkinson’s Disease, LRRK2, late-onset, neurodegeneration | 4 | 13 | 4.39 |
| ADSS | MAPK14 | Mitogen-Activated Protein Kinase 14 | Protein Coding | 62 | Parkinson’s Disease, LRRK2, late-onset, neurodegeneration | 4 | 89 | 4.39 |
| BTBD3 | CUL1 | Cullin 1 | Protein Coding | 55 | Parkinson’s Disease, LRRK2, late-onset, neurodegeneration | 4 | 143 | 1.47 |
| BTBD3 | SKP1 | S-Phase Kinase Associated Protein 1 | Protein Coding | 52 | Parkinson’s Disease, LRRK2, neurodegeneration | 3 | 310 | 1.47 |
| BTBD3 | E2F1 | E2F Transcription Factor 1 | Protein Coding | 53 | Parkinson’s Disease, LRRK2, neurodegeneration | 3 | 1777 | 1.47 |
| BTBD3 | DAPK1 | Death Associated Protein Kinase 1 | Protein Coding | 57 | Parkinson’s Disease, LRRK2, late-onset, neurodegeneration | 4 | 823 | 1.47 |
| BTBD3 | PLCB1 | Phospholipase C Beta 1 | Protein Coding | 57 | Parkinson’s Disease, LRRK2, late-onset, neurodegeneration | 4 | 1019 | 1.47 |
| RTCA | LMNB1 | Lamin B1 | Protein Coding | 57 | Parkinson’s Disease, LRRK2, late-onset, neurodegeneration | 4 | 228 | 1.23 |
| RTCA | NOP56 | NOP56 Ribonucleoprotein | Protein Coding | 51 | Parkinson’s Disease, late-onset, neurodegeneration | 3 | 373 | 1.23 |
| RTCA | IL6 | Interleukin 6 | Protein Coding | 59 | Parkinson’s Disease, LRRK2, late-onset, neurodegeneration | 4 | 197 | 1.23 |
| RTCA | HNRNPA1 | Heterogeneous Nuclear Ribonucleoprotein A1 | Protein Coding | 56 | Parkinson’s Disease, LRRK2, late-onset, neurodegeneration | 4 | 538 | 1.23 |
| RTCA | TNF | Tumor Necrosis Factor | Protein Coding | 63 | Parkinson’s Disease, LRRK2, late-onset, neurodegeneration | 4 | 185 | 1.23 |
KEGG
Wikipathways
Reactome
GO Cellular Component
GO Biological Process
GO Molecular Function